<!DOCTYPE html>
<html lang="" xml:lang="">
<head>

  <meta charset="utf-8" />
  <meta http-equiv="X-UA-Compatible" content="IE=edge" />
  <title>第 2 章 基础语法 | 极客R：数据分析之道</title>
  <meta name="description" content="一本极简 R 入门图书" />
  <meta name="generator" content="bookdown 0.18 and GitBook 2.6.7" />

  <meta property="og:title" content="第 2 章 基础语法 | 极客R：数据分析之道" />
  <meta property="og:type" content="book" />
  
  <meta property="og:image" content="cover.png" />
  <meta property="og:description" content="一本极简 R 入门图书" />
  <meta name="github-repo" content="ShixiangWang/geek-r-tutorial" />

  <meta name="twitter:card" content="summary" />
  <meta name="twitter:title" content="第 2 章 基础语法 | 极客R：数据分析之道" />
  
  <meta name="twitter:description" content="一本极简 R 入门图书" />
  <meta name="twitter:image" content="cover.png" />

<meta name="author" content="王诗翔, 生信技能树" />



  <meta name="viewport" content="width=device-width, initial-scale=1" />
  <meta name="apple-mobile-web-app-capable" content="yes" />
  <meta name="apple-mobile-web-app-status-bar-style" content="black" />
  
  
<link rel="prev" href="prepare.html"/>
<link rel="next" href="import.html"/>
<script src="libs/jquery/jquery.min.js"></script>
<link href="libs/gitbook/css/style.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-table.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-bookdown.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-highlight.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-search.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-fontsettings.css" rel="stylesheet" />
<link href="libs/gitbook/css/plugin-clipboard.css" rel="stylesheet" />











<style type="text/css">
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
  { counter-reset: source-line 0; }
pre.numberSource code > span
  { position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
  { content: counter(source-line);
    position: relative; left: -1em; text-align: right; vertical-align: baseline;
    border: none; display: inline-block;
    -webkit-touch-callout: none; -webkit-user-select: none;
    -khtml-user-select: none; -moz-user-select: none;
    -ms-user-select: none; user-select: none;
    padding: 0 4px; width: 4em;
    color: #aaaaaa;
  }
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
div.sourceCode
  {   }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
</style>

<link rel="stylesheet" href="css\style.css" type="text/css" />
</head>

<body>



  <div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">

    <div class="book-summary">
      <nav role="navigation">

<ul class="summary">
<li><a href="./">数据分析之道</a></li>

<li class="divider"></li>
<li class="chapter" data-level="" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i>前言</a>
<ul>
<li class="chapter" data-level="" data-path="index.html"><a href="index.html#内容简介"><i class="fa fa-check"></i>内容简介</a></li>
<li class="chapter" data-level="" data-path="index.html"><a href="index.html#许可"><i class="fa fa-check"></i>许可</a></li>
<li class="chapter" data-level="" data-path="index.html"><a href="index.html#建议与反馈"><i class="fa fa-check"></i>建议与反馈</a></li>
<li class="chapter" data-level="" data-path="index.html"><a href="index.html#致谢"><i class="fa fa-check"></i>致谢</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="author.html"><a href="author.html"><i class="fa fa-check"></i>作者简介</a></li>
<li class="chapter" data-level="1" data-path="prepare.html"><a href="prepare.html"><i class="fa fa-check"></i><b>1</b> 准备工作</a>
<ul>
<li class="chapter" data-level="1.1" data-path="prepare.html"><a href="prepare.html#r-的下载和安装"><i class="fa fa-check"></i><b>1.1</b> R 的下载和安装</a></li>
<li class="chapter" data-level="1.2" data-path="prepare.html"><a href="prepare.html#rstudio-的下载和安装"><i class="fa fa-check"></i><b>1.2</b> RStudio 的下载和安装</a></li>
<li class="chapter" data-level="1.3" data-path="prepare.html"><a href="prepare.html#配置可选"><i class="fa fa-check"></i><b>1.3</b> 配置（可选）</a></li>
<li class="chapter" data-level="1.4" data-path="prepare.html"><a href="prepare.html#常见问题与方案"><i class="fa fa-check"></i><b>1.4</b> 常见问题与方案</a>
<ul>
<li class="chapter" data-level="1.4.1" data-path="prepare.html"><a href="prepare.html#r-在-linux-系统下的安装"><i class="fa fa-check"></i><b>1.4.1</b> R 在 Linux 系统下的安装</a></li>
<li class="chapter" data-level="1.4.2" data-path="prepare.html"><a href="prepare.html#rtools-安装"><i class="fa fa-check"></i><b>1.4.2</b> Rtools 安装</a></li>
<li class="chapter" data-level="1.4.3" data-path="prepare.html"><a href="prepare.html#rstudio-server-安装"><i class="fa fa-check"></i><b>1.4.3</b> RStudio Server 安装</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="2" data-path="base.html"><a href="base.html"><i class="fa fa-check"></i><b>2</b> 基础语法</a>
<ul>
<li class="chapter" data-level="2.1" data-path="base.html"><a href="base.html#基本数据结构"><i class="fa fa-check"></i><b>2.1</b> 基本数据结构</a>
<ul>
<li class="chapter" data-level="2.1.1" data-path="base.html"><a href="base.html#向量"><i class="fa fa-check"></i><b>2.1.1</b> 向量</a></li>
<li class="chapter" data-level="2.1.2" data-path="base.html"><a href="base.html#数组与矩阵"><i class="fa fa-check"></i><b>2.1.2</b> 数组与矩阵</a></li>
<li class="chapter" data-level="2.1.3" data-path="base.html"><a href="base.html#数据框"><i class="fa fa-check"></i><b>2.1.3</b> 数据框</a></li>
<li class="chapter" data-level="2.1.4" data-path="base.html"><a href="base.html#列表"><i class="fa fa-check"></i><b>2.1.4</b> 列表</a></li>
</ul></li>
<li class="chapter" data-level="2.2" data-path="base.html"><a href="base.html#控制结构"><i class="fa fa-check"></i><b>2.2</b> 控制结构</a>
<ul>
<li class="chapter" data-level="2.2.1" data-path="base.html"><a href="base.html#条件控制"><i class="fa fa-check"></i><b>2.2.1</b> 条件控制</a></li>
<li class="chapter" data-level="2.2.2" data-path="base.html"><a href="base.html#循环控制"><i class="fa fa-check"></i><b>2.2.2</b> 循环控制</a></li>
</ul></li>
<li class="chapter" data-level="2.3" data-path="base.html"><a href="base.html#函数与函数式编程"><i class="fa fa-check"></i><b>2.3</b> 函数与函数式编程</a>
<ul>
<li class="chapter" data-level="2.3.1" data-path="base.html"><a href="base.html#创建和使用函数"><i class="fa fa-check"></i><b>2.3.1</b> 创建和使用函数</a></li>
<li class="chapter" data-level="2.3.2" data-path="base.html"><a href="base.html#函数式编程"><i class="fa fa-check"></i><b>2.3.2</b> 函数式编程</a></li>
</ul></li>
<li class="chapter" data-level="2.4" data-path="base.html"><a href="base.html#三方包的安装与加载"><i class="fa fa-check"></i><b>2.4</b> 三方包的安装与加载</a>
<ul>
<li class="chapter" data-level="2.4.1" data-path="base.html"><a href="base.html#cran"><i class="fa fa-check"></i><b>2.4.1</b> CRAN</a></li>
<li class="chapter" data-level="2.4.2" data-path="base.html"><a href="base.html#bioconductor"><i class="fa fa-check"></i><b>2.4.2</b> Bioconductor</a></li>
<li class="chapter" data-level="2.4.3" data-path="base.html"><a href="base.html#github-等-git-库"><i class="fa fa-check"></i><b>2.4.3</b> GitHub 等 Git 库</a></li>
<li class="chapter" data-level="2.4.4" data-path="base.html"><a href="base.html#包使用"><i class="fa fa-check"></i><b>2.4.4</b> 包使用</a></li>
</ul></li>
<li class="chapter" data-level="2.5" data-path="base.html"><a href="base.html#编程实战roc-曲线计算与绘制"><i class="fa fa-check"></i><b>2.5</b> 编程实战：ROC 曲线计算与绘制</a></li>
<li class="chapter" data-level="2.6" data-path="base.html"><a href="base.html#常见问题与方案-1"><i class="fa fa-check"></i><b>2.6</b> 常见问题与方案</a>
<ul>
<li class="chapter" data-level="2.6.1" data-path="base.html"><a href="base.html#与---的区别"><i class="fa fa-check"></i><b>2.6.1</b> = 与 &lt;- 的区别</a></li>
<li class="chapter" data-level="2.6.2" data-path="base.html"><a href="base.html#因子重构"><i class="fa fa-check"></i><b>2.6.2</b> 因子重构</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="3" data-path="import.html"><a href="import.html"><i class="fa fa-check"></i><b>3</b> 数据导入</a>
<ul>
<li class="chapter" data-level="" data-path="import.html"><a href="import.html#csv"><i class="fa fa-check"></i>CSV</a></li>
<li class="chapter" data-level="" data-path="import.html"><a href="import.html#excel"><i class="fa fa-check"></i>Excel</a></li>
<li class="chapter" data-level="" data-path="import.html"><a href="import.html#常见问题与方案-2"><i class="fa fa-check"></i>常见问题与方案</a></li>
</ul></li>
<li class="chapter" data-level="4" data-path="clean.html"><a href="clean.html"><i class="fa fa-check"></i><b>4</b> 数据清洗</a>
<ul>
<li class="chapter" data-level="" data-path="clean.html"><a href="clean.html#常见问题与方案-3"><i class="fa fa-check"></i>常见问题与方案</a></li>
</ul></li>
<li class="chapter" data-level="5" data-path="visualization.html"><a href="visualization.html"><i class="fa fa-check"></i><b>5</b> 数据可视化</a>
<ul>
<li class="chapter" data-level="" data-path="visualization.html"><a href="visualization.html#常见问题与方案-4"><i class="fa fa-check"></i>常见问题与方案</a></li>
</ul></li>
<li class="chapter" data-level="6" data-path="model.html"><a href="model.html"><i class="fa fa-check"></i><b>6</b> 统计建模</a>
<ul>
<li class="chapter" data-level="" data-path="model.html"><a href="model.html#常见问题与方案-5"><i class="fa fa-check"></i>常见问题与方案</a></li>
</ul></li>
<li class="chapter" data-level="7" data-path="report.html"><a href="report.html"><i class="fa fa-check"></i><b>7</b> 结果展示</a>
<ul>
<li class="chapter" data-level="" data-path="report.html"><a href="report.html#图形"><i class="fa fa-check"></i>图形</a></li>
<li class="chapter" data-level="" data-path="report.html"><a href="report.html#表格"><i class="fa fa-check"></i>表格</a></li>
<li class="chapter" data-level="" data-path="report.html"><a href="report.html#rmarkdown"><i class="fa fa-check"></i>RMarkdown</a></li>
<li class="chapter" data-level="" data-path="report.html"><a href="report.html#shiny"><i class="fa fa-check"></i>Shiny</a></li>
<li class="chapter" data-level="" data-path="report.html"><a href="report.html#常见问题与方案-6"><i class="fa fa-check"></i>常见问题与方案</a></li>
</ul></li>
<li class="chapter" data-level="8" data-path="bioapp.html"><a href="bioapp.html"><i class="fa fa-check"></i><b>8</b> 生物信息学应用</a></li>
<li class="appendix"><span><b>附录</b></span></li>
<li class="chapter" data-level="A" data-path="expand-reading.html"><a href="expand-reading.html"><i class="fa fa-check"></i><b>A</b> 拓展阅读</a>
<ul>
<li class="chapter" data-level="A.1" data-path="expand-reading.html"><a href="expand-reading.html#生信技能树语雀知识库"><i class="fa fa-check"></i><b>A.1</b> 生信技能树语雀知识库</a></li>
<li class="chapter" data-level="A.2" data-path="expand-reading.html"><a href="expand-reading.html#图书"><i class="fa fa-check"></i><b>A.2</b> 图书</a>
<ul>
<li class="chapter" data-level="A.2.1" data-path="expand-reading.html"><a href="expand-reading.html#问题与方案"><i class="fa fa-check"></i><b>A.2.1</b> 问题与方案</a></li>
<li class="chapter" data-level="A.2.2" data-path="expand-reading.html"><a href="expand-reading.html#统计建模"><i class="fa fa-check"></i><b>A.2.2</b> 统计建模</a></li>
<li class="chapter" data-level="A.2.3" data-path="expand-reading.html"><a href="expand-reading.html#核心集合"><i class="fa fa-check"></i><b>A.2.3</b> 核心集合</a></li>
<li class="chapter" data-level="A.2.4" data-path="expand-reading.html"><a href="expand-reading.html#生物信息学"><i class="fa fa-check"></i><b>A.2.4</b> 生物信息学</a></li>
<li class="chapter" data-level="A.2.5" data-path="expand-reading.html"><a href="expand-reading.html#r"><i class="fa fa-check"></i><b>A.2.5</b> R</a></li>
</ul></li>
<li class="chapter" data-level="A.3" data-path="expand-reading.html"><a href="expand-reading.html#视频"><i class="fa fa-check"></i><b>A.3</b> 视频</a></li>
<li class="chapter" data-level="A.4" data-path="expand-reading.html"><a href="expand-reading.html#公众号"><i class="fa fa-check"></i><b>A.4</b> 公众号</a></li>
<li class="chapter" data-level="A.5" data-path="expand-reading.html"><a href="expand-reading.html#其他资料"><i class="fa fa-check"></i><b>A.5</b> 其他资料</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>参考文献</a></li>
</ul>

      </nav>
    </div>

    <div class="book-body">
      <div class="body-inner">
        <div class="book-header" role="navigation">
          <h1>
            <i class="fa fa-circle-o-notch fa-spin"></i><a href="./">极客R：数据分析之道</a>
          </h1>
        </div>

        <div class="page-wrapper" tabindex="-1" role="main">
          <div class="page-inner">

            <section class="normal" id="section-">
<div id="base" class="section level1">
<h1><span class="header-section-number">第 2 章</span> 基础语法</h1>
<p>“程序 = 算法 + 数据结构”，<strong>数据结构</strong>是信息的载体，而<strong>算法</strong>是完成任务所需要的步骤。两者的构造和使用方法形成了编程语言独特的语法。本章先介绍 R 的基本数据结构，然后介绍条件和循环控制，接着介绍函数的创建与拓展包的使用，最后通过编程实战来实践和掌握本章涉及的知识点。</p>
<div id="基本数据结构" class="section level2">
<h2><span class="header-section-number">2.1</span> 基本数据结构</h2>
<p>为了表示现实世界的信息，各类编程语言常包含 3 种基本的数据类型：<strong>数值型</strong>，包括整数和浮点数；<strong>字符型</strong>，表示文本信息；<strong>逻辑型</strong>，也常称为布尔值，表示是非判断，如对与错，是与否。在 R 中，除了这些基本数据类型的实现，为了方便计算工作，R 本身还包含了矩阵、数据框和列表等复杂的数据类型，以支持表示各类常用的数据。</p>
<div id="向量" class="section level3">
<h3><span class="header-section-number">2.1.1</span> 向量</h3>
<p>在 R 中，数据运算常通过向量的形式进行。<strong>向量</strong>是一组同质的信息，如 20 个数字、30 个字符串（与数学术语中的向量类似，但不等同）。单一的信息在此被称为<strong>元素</strong>。<strong>标量</strong>可以看作元素数量为 1 的向量。</p>
<p>接下来我们通过向量元素的数据类型来实际地了解和操作它。</p>
<div id="数值" class="section level4">
<h4><span class="header-section-number">2.1.1.1</span> 数值</h4>
<p>数值应该可以说是最常用的信息表现形式，如人的身高、年龄。在 R 中使用小学学到的阿拉伯表示法即可创建数值，如圆周率 <span class="math inline">\(\pi\)</span>：</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb17-1"><a href="base.html#cb17-1"></a><span class="fl">3.14</span></span>
<span id="cb17-2"><a href="base.html#cb17-2"></a><span class="co">#&gt; [1] 3.14</span></span></code></pre></div>
<blockquote>
<p>此处 <code>#&gt;</code> 后显示 R 运行代码后的返回结果，<code>[1]</code> 是结果的索引，以辅助用户观测，这里表示结果的第 1 个值是 3.14。</p>
</blockquote>
<p><code>typeof()</code> 与 <code>class()</code> 是两个对于初学者非常有用的函数，它们可以返回数据的类型信息。</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb18-1"><a href="base.html#cb18-1"></a><span class="kw">typeof</span>(<span class="fl">3.14</span>)</span>
<span id="cb18-2"><a href="base.html#cb18-2"></a><span class="co">#&gt; [1] &quot;double&quot;</span></span>
<span id="cb18-3"><a href="base.html#cb18-3"></a><span class="kw">class</span>(<span class="fl">3.14</span>)</span>
<span id="cb18-4"><a href="base.html#cb18-4"></a><span class="co">#&gt; [1] &quot;numeric&quot;</span></span></code></pre></div>
<p>在 R 中不需要像其他语言一样区分数值的精度信息，<code>typeof()</code> 返回结果为 <code>double</code> 提示该值是一个浮点数。</p>
<p>在 R 中，任何所见的事物皆为<strong>对象</strong>，<code>class()</code> 返回对象的类信息，此处是 <code>numeric</code>（数值）。</p>
<p>我们再来看看如何在 R 中表示整数。借助上述两个工具函数，我们不难发现下面的代码与想象不同。</p>
<div class="sourceCode" id="cb19"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb19-1"><a href="base.html#cb19-1"></a><span class="dv">3</span></span>
<span id="cb19-2"><a href="base.html#cb19-2"></a><span class="co">#&gt; [1] 3</span></span>
<span id="cb19-3"><a href="base.html#cb19-3"></a></span>
<span id="cb19-4"><a href="base.html#cb19-4"></a><span class="kw">typeof</span>(<span class="dv">3</span>)</span>
<span id="cb19-5"><a href="base.html#cb19-5"></a><span class="co">#&gt; [1] &quot;double&quot;</span></span>
<span id="cb19-6"><a href="base.html#cb19-6"></a><span class="kw">class</span>(<span class="dv">3</span>)</span>
<span id="cb19-7"><a href="base.html#cb19-7"></a><span class="co">#&gt; [1] &quot;numeric&quot;</span></span></code></pre></div>
<p><code>typeof()</code> 与 <code>class()</code> 对于 3 的返回结果与 3.14 完全相同！这是因为即便只输入 3，R 也将其作为浮点数对待。</p>
<p>我们可以利用 <code>identical()</code> 函数或 <code>is.integer()</code> 函数进行检查：</p>
<div class="sourceCode" id="cb20"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb20-1"><a href="base.html#cb20-1"></a><span class="kw">identical</span>(<span class="dv">3</span>, <span class="fl">3.0</span>)</span>
<span id="cb20-2"><a href="base.html#cb20-2"></a><span class="co">#&gt; [1] TRUE</span></span>
<span id="cb20-3"><a href="base.html#cb20-3"></a></span>
<span id="cb20-4"><a href="base.html#cb20-4"></a><span class="kw">is.integer</span>(<span class="dv">3</span>)</span>
<span id="cb20-5"><a href="base.html#cb20-5"></a><span class="co">#&gt; [1] FALSE</span></span></code></pre></div>
<p>返回的结果是后面将介绍的逻辑值，<code>TRUE</code> 表示对、<code>FALSE</code> 表示错。因此可以判断 <code>3</code> 并不是整数。</p>
<p>正确的整数表示方法需要在数字后加 <code>L</code> 后缀，如 <code>3L</code>。</p>
<div class="sourceCode" id="cb21"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb21-1"><a href="base.html#cb21-1"></a><span class="kw">is.integer</span>(3L)</span>
<span id="cb21-2"><a href="base.html#cb21-2"></a><span class="co">#&gt; [1] TRUE</span></span>
<span id="cb21-3"><a href="base.html#cb21-3"></a></span>
<span id="cb21-4"><a href="base.html#cb21-4"></a><span class="kw">identical</span>(3L, <span class="dv">3</span>)</span>
<span id="cb21-5"><a href="base.html#cb21-5"></a><span class="co">#&gt; [1] FALSE</span></span></code></pre></div>
<p><code>is.integer()</code> 函数隶属于 <code>is.xxx()</code> 家族，该函数家族用于辅助判断对象是否属于某一类型。读者在 RStudio 中输入 <code>is.</code> 后 RStudio 将智能提示有哪些函数的名字以 <code>is.</code> 开头。</p>
<p>浮点数和整数都是数值，所以下面的代码都会返回 <code>TRUE</code>：</p>
<div class="sourceCode" id="cb22"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb22-1"><a href="base.html#cb22-1"></a><span class="kw">is.numeric</span>(<span class="fl">3.14</span>)</span>
<span id="cb22-2"><a href="base.html#cb22-2"></a><span class="co">#&gt; [1] TRUE</span></span>
<span id="cb22-3"><a href="base.html#cb22-3"></a><span class="kw">is.numeric</span>(3L)</span>
<span id="cb22-4"><a href="base.html#cb22-4"></a><span class="co">#&gt; [1] TRUE</span></span></code></pre></div>
<p>现实中的数据常成组出现，例如，一组学生的身高。R 使用 <code>c()</code> 函数（<code>c</code> 为 <code>combine</code> 的缩写）对数据进行组合：</p>
<div class="sourceCode" id="cb23"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb23-1"><a href="base.html#cb23-1"></a><span class="kw">c</span>(<span class="fl">1.70</span>, <span class="fl">1.72</span>, <span class="fl">1.80</span>, <span class="fl">1.66</span>, <span class="fl">1.65</span>, <span class="fl">1.88</span>)</span>
<span id="cb23-2"><a href="base.html#cb23-2"></a><span class="co">#&gt; [1] 1.70 1.72 1.80 1.66 1.65 1.88</span></span></code></pre></div>
<p>这样我们就有了一组身高数据。</p>
<p>利用 R 自带的 <code>mean()</code> 和 <code>sd()</code> 还是我们可以轻易求取这组数据的均值和标准差：</p>
<div class="sourceCode" id="cb24"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb24-1"><a href="base.html#cb24-1"></a><span class="co"># 均值</span></span>
<span id="cb24-2"><a href="base.html#cb24-2"></a><span class="kw">mean</span>(<span class="kw">c</span>(<span class="fl">1.70</span>, <span class="fl">1.72</span>, <span class="fl">1.80</span>, <span class="fl">1.66</span>, <span class="fl">1.65</span>, <span class="fl">1.88</span>))</span>
<span id="cb24-3"><a href="base.html#cb24-3"></a><span class="co">#&gt; [1] 1.735</span></span>
<span id="cb24-4"><a href="base.html#cb24-4"></a><span class="co"># 标准差</span></span>
<span id="cb24-5"><a href="base.html#cb24-5"></a><span class="kw">sd</span>(<span class="kw">c</span>(<span class="fl">1.70</span>, <span class="fl">1.72</span>, <span class="fl">1.80</span>, <span class="fl">1.66</span>, <span class="fl">1.65</span>, <span class="fl">1.88</span>))</span>
<span id="cb24-6"><a href="base.html#cb24-6"></a><span class="co">#&gt; [1] 0.08894</span></span></code></pre></div>
<p>上面我们计算时我们重复输入了身高数据，如果在输入时发生了小小的意外，如计算标准差时将 <code>1.65</code> 写成了 <code>1.66</code>，那么我们分析得就不是同一组数据了！虽然说在上述的简单计算不太可能发生这种情况，但如果存在 100 甚至 1000 个数据的重复输入，依靠人眼判断几乎是必然出错的。</p>
<p>一个解决办法是依赖系统自带的复制粘贴机制，但如果一组数据被上百次重复使用，这种办法也不实际。</p>
<p>正确的解决办法是引入一个符号（Symbol），用该符号<strong>指代</strong>一组数据，然后每次需要使用该数据时，使用符号代替即可。符号在编程语言中也常被称为<strong>变量</strong>，后面我们统一使用该术语。</p>
<p>上述代码块改写为：</p>
<div class="sourceCode" id="cb25"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb25-1"><a href="base.html#cb25-1"></a>heights &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="fl">1.70</span>, <span class="fl">1.72</span>, <span class="fl">1.80</span>, <span class="fl">1.66</span>, <span class="fl">1.65</span>, <span class="fl">1.88</span>)</span>
<span id="cb25-2"><a href="base.html#cb25-2"></a><span class="kw">mean</span>(heights)</span>
<span id="cb25-3"><a href="base.html#cb25-3"></a><span class="co">#&gt; [1] 1.735</span></span>
<span id="cb25-4"><a href="base.html#cb25-4"></a><span class="kw">sd</span>(heights)</span>
<span id="cb25-5"><a href="base.html#cb25-5"></a><span class="co">#&gt; [1] 0.08894</span></span></code></pre></div>
<p><code>&lt;-</code> 符号在 R 中称为赋值符号，我们可以将它看作数据的流动方向，这样更方便理解，我们不难猜测到 <code>-&gt;</code> 的写法也是有效的：</p>
<div class="sourceCode" id="cb26"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb26-1"><a href="base.html#cb26-1"></a><span class="kw">c</span>(<span class="fl">1.70</span>, <span class="fl">1.72</span>, <span class="fl">1.80</span>, <span class="fl">1.66</span>, <span class="fl">1.65</span>, <span class="fl">1.88</span>) -&gt;<span class="st"> </span>heights2</span>
<span id="cb26-2"><a href="base.html#cb26-2"></a>heights2</span>
<span id="cb26-3"><a href="base.html#cb26-3"></a><span class="co">#&gt; [1] 1.70 1.72 1.80 1.66 1.65 1.88</span></span></code></pre></div>
<p>但通常以 <code>&lt;-</code> 的写法为主。</p>
<p>另外，<code>=</code> 符号与 <code>&lt;-</code> 有基本相同的含义，但不常用。如果读者有其他编程语言经验，也可以使用它作为常用赋值符号。两者的区别见本章【<strong>常见问题与方案</strong>】一节。</p>
<p>R 中变量的命名有一些限制，最重要的就是不要以数字开头：</p>
<div class="sourceCode" id="cb27"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb27-1"><a href="base.html#cb27-1"></a>3ab =<span class="st"> </span><span class="dv">3</span></span>
<span id="cb27-2"><a href="base.html#cb27-2"></a><span class="co">#&gt; Error: &lt;text&gt;:1:2: unexpected symbol</span></span>
<span id="cb27-3"><a href="base.html#cb27-3"></a><span class="co">#&gt; 1: 3ab</span></span>
<span id="cb27-4"><a href="base.html#cb27-4"></a><span class="co">#&gt;      ^</span></span></code></pre></div>
<p>变量命名有 2 点建议：</p>
<ol style="list-style-type: decimal">
<li>对于一些临时使用的变量，以简单为主，如 <code>i</code>、<code>j</code>、<code>k</code> 等。</li>
<li>与数据相关的命名，建议与其信息一致，如上面的代码我使用了 <code>heights</code>，不然在没有注释的情况下，代码的阅读者无法快速理解你写的程序。</li>
</ol>
<p>另外，<strong>长变量</strong>的命名通常有 2 个推荐的规则：</p>
<ol style="list-style-type: decimal">
<li>骆驼法</li>
</ol>
<p>以学生身高数据为例，可以写为 <code>studentHeights</code>，它遵循 <code>aBcDeF</code> 这样的构造方式。</p>
<ol start="2" style="list-style-type: decimal">
<li>蛇形</li>
</ol>
<p>以下划线作为分隔符，写为 <code>student_heights</code>。</p>
<p>两种写法在 R 中都很常用，读者选择一种即可，<strong>重点在于一个 R 脚本中应当保持变量名命名风格的一致</strong>。</p>
<p>在了解向量和变量后，我们再来学习下向量的计算方式。</p>
<p>假设我们有两组数据，分别以变量 <code>a</code> 和 <code>b</code> 存储：</p>
<div class="sourceCode" id="cb28"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb28-1"><a href="base.html#cb28-1"></a>a &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>)</span>
<span id="cb28-2"><a href="base.html#cb28-2"></a>b &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="dv">4</span>, <span class="dv">5</span>, <span class="dv">6</span>)</span></code></pre></div>
<p>我们将其堆叠到一起，如图 <a href="base.html#fig:vector-construction">2.1</a>：</p>
<div class="figure" style="text-align: center"><span id="fig:vector-construction"></span>
<img src="bookdown_files/figure-html/vector-construction-1.png" alt="向量的直观展示" width="384" />
<p class="caption">
图 2.1: 向量的直观展示
</p>
</div>
<p>当我们将 <code>a</code> 与 <code>b</code> 相加，结果是什么呢？</p>
<div class="sourceCode" id="cb29"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb29-1"><a href="base.html#cb29-1"></a>a <span class="op">+</span><span class="st"> </span>b</span>
<span id="cb29-2"><a href="base.html#cb29-2"></a><span class="co">#&gt; [1] 5 7 9</span></span></code></pre></div>
<p>两个向量之和是向量元素一一相加组成的向量。如果向量的元素不相同，结果又是如何呢？</p>
<p>我们将 <code>a</code> 与 <code>4</code> 相加看一看，此时向量堆叠如图 <a href="base.html#fig:vector-add">2.2</a> 所示：</p>
<div class="figure" style="text-align: center"><span id="fig:vector-add"></span>
<img src="bookdown_files/figure-html/vector-add-1.png" alt="向量不等长图示" width="384" />
<p class="caption">
图 2.2: 向量不等长图示
</p>
</div>
<div class="sourceCode" id="cb30"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb30-1"><a href="base.html#cb30-1"></a>a <span class="op">+</span><span class="st"> </span><span class="dv">4</span></span>
<span id="cb30-2"><a href="base.html#cb30-2"></a><span class="co">#&gt; [1] 5 6 7</span></span></code></pre></div>
<p>上述结果与 <code>a + c(4, 4, 4)</code> 相同：</p>
<div class="sourceCode" id="cb31"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb31-1"><a href="base.html#cb31-1"></a>a <span class="op">+</span><span class="st"> </span><span class="kw">c</span>(<span class="dv">4</span>, <span class="dv">4</span>, <span class="dv">4</span>)</span>
<span id="cb31-2"><a href="base.html#cb31-2"></a><span class="co">#&gt; [1] 5 6 7</span></span></code></pre></div>
<p>因此，如果向量不等长时，短向量会通过重复与长向量先对齐（如图 <a href="base.html#fig:vector-align">2.3</a>），然后再相加。</p>
<div class="figure" style="text-align: center"><span id="fig:vector-align"></span>
<img src="bookdown_files/figure-html/vector-align-1.png" alt="向量对齐" width="384" />
<p class="caption">
图 2.3: 向量对齐
</p>
</div>
<p>注意，此过程中，长向量会保持不变，如果出现短向量重复后无法对齐的情况，多余的部分将被扔掉，R 返回结果的同时会抛出一个警告信息。</p>
<div class="sourceCode" id="cb32"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb32-1"><a href="base.html#cb32-1"></a><span class="kw">c</span>(<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>) <span class="op">+</span><span class="st"> </span><span class="kw">c</span>(<span class="dv">4</span>, <span class="dv">5</span>)</span>
<span id="cb32-2"><a href="base.html#cb32-2"></a><span class="co">#&gt; Warning in c(1, 2, 3) + c(4, 5): longer object length</span></span>
<span id="cb32-3"><a href="base.html#cb32-3"></a><span class="co">#&gt; is not a multiple of shorter object length</span></span>
<span id="cb32-4"><a href="base.html#cb32-4"></a><span class="co">#&gt; [1] 5 7 7</span></span>
<span id="cb32-5"><a href="base.html#cb32-5"></a><span class="co"># 上面的加法等价于 c(1, 2, 3) + c(4, 5, 4)</span></span></code></pre></div>
<p>整个过程称为<strong>向量化运算</strong>。除了加法，其他任何向量（几何）运算方式都相同。</p>
<div class="sourceCode" id="cb33"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb33-1"><a href="base.html#cb33-1"></a><span class="co"># 想减</span></span>
<span id="cb33-2"><a href="base.html#cb33-2"></a>a <span class="op">-</span><span class="st"> </span>b</span>
<span id="cb33-3"><a href="base.html#cb33-3"></a><span class="co">#&gt; [1] -3 -3 -3</span></span>
<span id="cb33-4"><a href="base.html#cb33-4"></a><span class="co"># 相除</span></span>
<span id="cb33-5"><a href="base.html#cb33-5"></a>a <span class="op">/</span><span class="st"> </span>b</span>
<span id="cb33-6"><a href="base.html#cb33-6"></a><span class="co">#&gt; [1] 0.25 0.40 0.50</span></span>
<span id="cb33-7"><a href="base.html#cb33-7"></a><span class="co"># 相乘</span></span>
<span id="cb33-8"><a href="base.html#cb33-8"></a>a <span class="op">*</span><span class="st"> </span>b</span>
<span id="cb33-9"><a href="base.html#cb33-9"></a><span class="co">#&gt; [1]  4 10 18</span></span>
<span id="cb33-10"><a href="base.html#cb33-10"></a><span class="co"># 整除</span></span>
<span id="cb33-11"><a href="base.html#cb33-11"></a>a <span class="op">%/%</span><span class="st"> </span>b</span>
<span id="cb33-12"><a href="base.html#cb33-12"></a><span class="co">#&gt; [1] 0 0 0</span></span>
<span id="cb33-13"><a href="base.html#cb33-13"></a><span class="co"># 取余数</span></span>
<span id="cb33-14"><a href="base.html#cb33-14"></a>a <span class="op">%%</span><span class="st"> </span>b</span>
<span id="cb33-15"><a href="base.html#cb33-15"></a><span class="co">#&gt; [1] 1 2 3</span></span>
<span id="cb33-16"><a href="base.html#cb33-16"></a><span class="co"># 平方</span></span>
<span id="cb33-17"><a href="base.html#cb33-17"></a>a <span class="op">^</span><span class="st"> </span><span class="dv">2</span></span>
<span id="cb33-18"><a href="base.html#cb33-18"></a><span class="co">#&gt; [1] 1 4 9</span></span>
<span id="cb33-19"><a href="base.html#cb33-19"></a><span class="co"># 取对数</span></span>
<span id="cb33-20"><a href="base.html#cb33-20"></a><span class="kw">log</span>(a, <span class="dt">base =</span> <span class="dv">2</span>)</span>
<span id="cb33-21"><a href="base.html#cb33-21"></a><span class="co">#&gt; [1] 0.000 1.000 1.585</span></span></code></pre></div>
<p><strong>向量化运算</strong>的本质是成对的向量元素运算操作。这个特性让 R 在处理数据时非常方便，无论向量元素的个数是多少，在运算时我们都可以将其作为标量对待。</p>
<p>例如，计算数据 <code>heights</code> 的均值和标准差，这里我们直接通过公式而不是 R 自带的函数进行计算：</p>
<p><span class="math display">\[
\mu = \frac{\sum x_i}{n}
\]</span></p>
<p><span class="math display">\[
sd = \sqrt\frac{\sum(x_i - \mu)^2}{n - 1}
\]</span></p>
<blockquote>
<p><code>sd</code> 的计算中使用的是 <code>n-1</code> 而不是 <code>n</code> 的原因是我们计算的是<strong>样本</strong>标准差。</p>
</blockquote>
<p>实际操作如下：</p>
<div class="sourceCode" id="cb34"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb34-1"><a href="base.html#cb34-1"></a><span class="co"># 先计算均值</span></span>
<span id="cb34-2"><a href="base.html#cb34-2"></a>heightsMean &lt;-<span class="st"> </span><span class="kw">sum</span>(heights) <span class="op">/</span><span class="st"> </span><span class="kw">length</span>(heights)</span>
<span id="cb34-3"><a href="base.html#cb34-3"></a>heightsMean</span>
<span id="cb34-4"><a href="base.html#cb34-4"></a><span class="co">#&gt; [1] 1.735</span></span>
<span id="cb34-5"><a href="base.html#cb34-5"></a><span class="co"># 计算标准差</span></span>
<span id="cb34-6"><a href="base.html#cb34-6"></a>heightsSD &lt;-<span class="st"> </span><span class="kw">sqrt</span>( <span class="kw">sum</span>( (heights <span class="op">-</span><span class="st"> </span>heightsMean)<span class="op">^</span><span class="st"> </span><span class="dv">2</span>) <span class="op">/</span><span class="st"> </span>(<span class="kw">length</span>(heights) <span class="op">-</span><span class="st"> </span><span class="dv">1</span>) )</span>
<span id="cb34-7"><a href="base.html#cb34-7"></a>heightsSD</span>
<span id="cb34-8"><a href="base.html#cb34-8"></a><span class="co">#&gt; [1] 0.08894</span></span></code></pre></div>
<p>将结果与 R 函数计算结果对比：</p>
<div class="sourceCode" id="cb35"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb35-1"><a href="base.html#cb35-1"></a><span class="kw">mean</span>(heights)</span>
<span id="cb35-2"><a href="base.html#cb35-2"></a><span class="co">#&gt; [1] 1.735</span></span>
<span id="cb35-3"><a href="base.html#cb35-3"></a><span class="kw">sd</span>(heights)</span>
<span id="cb35-4"><a href="base.html#cb35-4"></a><span class="co">#&gt; [1] 0.08894</span></span></code></pre></div>
<p>注意，上述我们使用了 R 的一些其他工具函数，<code>length()</code> 用来获取向量的长度，而 <code>sum()</code> 用来获取向量之和，<code>sqrt()</code> 用来计算开方。</p>
<p>初学者可能对于计算中使用的一些计算函数感到陌生，这是<strong>非常非常非常正常</strong>的，我个人也无法记得所有 R 提供的函数，编程是一门实践课程，读者需要通过使用去熟悉，而无法通过死记硬背掌握。在想要使用自己不知道的函数时，这里有几点建议：</p>
<ol style="list-style-type: decimal">
<li>猜测函数名。R 的函数命名方式是有规律可循的，且大体有对应的英文含义，读者不妨先尝试猜一猜函数名，看看是否真的有。</li>
<li>使用 R 的文档系统。R 的文档系统非常丰富，读者可以在 R 控制台 <code>?numeric</code> 来获取关于 <code>numeric</code> 的相关信息。而 <code>??numeric</code> 可以进行更为深度的搜索。学会读和理解函数文档是掌握 R 必备的技能。</li>
<li>使用搜索引擎。（初学者）遇到的问题基本都会有人遇到，R 的用户众多，各个博客和论坛都记录了关于 R 的使用和问题讨论，在上述 2 点无法解决问题时，读者不妨多使用搜索引擎查找相关资料。</li>
</ol>
<p>这一小节我们通过数值数据作为对象学习了一些重要的 R 基础概念和操作。接下来我们将这些知识拓展到其他基础数据类型中就相对容易多了。</p>
</div>
<div id="字符串" class="section level4">
<h4><span class="header-section-number">2.1.1.2</span> 字符串</h4>
<p>日常数据处理任务中除了常见的数值型数据，文本数据也比较常用。例如，表示性别是“男”或“女”，教育程度是“中学”还是“大学”。</p>
<p>在 R 中，并不能直接通过输入非数值的字符创建字符串：</p>
<div class="sourceCode" id="cb36"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb36-1"><a href="base.html#cb36-1"></a>男</span>
<span id="cb36-2"><a href="base.html#cb36-2"></a><span class="co">#&gt; Error in eval(expr, envir, enclos): object &#39;男&#39; not found</span></span></code></pre></div>
<p>文本数据需要通过单引号 <code>''</code> 或双引号 <code>""</code> 引号括起来，这样就可以创建字符串了：</p>
<div class="sourceCode" id="cb37"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb37-1"><a href="base.html#cb37-1"></a><span class="st">&#39;男&#39;</span></span>
<span id="cb37-2"><a href="base.html#cb37-2"></a><span class="co">#&gt; [1] &quot;男&quot;</span></span>
<span id="cb37-3"><a href="base.html#cb37-3"></a><span class="st">&quot;女&quot;</span></span>
<span id="cb37-4"><a href="base.html#cb37-4"></a><span class="co">#&gt; [1] &quot;女&quot;</span></span></code></pre></div>
<div class="sourceCode" id="cb38"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb38-1"><a href="base.html#cb38-1"></a><span class="kw">typeof</span>(<span class="st">&quot;abcde&quot;</span>)</span>
<span id="cb38-2"><a href="base.html#cb38-2"></a><span class="co">#&gt; [1] &quot;character&quot;</span></span>
<span id="cb38-3"><a href="base.html#cb38-3"></a><span class="kw">class</span>(<span class="st">&quot;abcde&quot;</span>)</span>
<span id="cb38-4"><a href="base.html#cb38-4"></a><span class="co">#&gt; [1] &quot;character&quot;</span></span></code></pre></div>
<p>函数 <code>nchar()</code> 常用于获取字符串中字符的个数：</p>
<div class="sourceCode" id="cb39"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb39-1"><a href="base.html#cb39-1"></a><span class="kw">nchar</span>(<span class="st">&quot;abcde&quot;</span>)</span>
<span id="cb39-2"><a href="base.html#cb39-2"></a><span class="co">#&gt; [1] 5</span></span></code></pre></div>
<p>注意，这与获取字符串向量的元素个数是不同的：</p>
<div class="sourceCode" id="cb40"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb40-1"><a href="base.html#cb40-1"></a>abc &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;abcde&quot;</span>, <span class="st">&quot;f&quot;</span>, <span class="st">&quot;g&quot;</span>)</span>
<span id="cb40-2"><a href="base.html#cb40-2"></a><span class="kw">length</span>(abc)</span>
<span id="cb40-3"><a href="base.html#cb40-3"></a><span class="co">#&gt; [1] 3</span></span>
<span id="cb40-4"><a href="base.html#cb40-4"></a><span class="kw">nchar</span>(abc)</span>
<span id="cb40-5"><a href="base.html#cb40-5"></a><span class="co">#&gt; [1] 5 1 1</span></span></code></pre></div>
<p>字符串常涉及集合操作，如交集、并集、差集:</p>
<div class="sourceCode" id="cb41"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb41-1"><a href="base.html#cb41-1"></a><span class="co"># 交集</span></span>
<span id="cb41-2"><a href="base.html#cb41-2"></a><span class="kw">intersect</span>(<span class="kw">c</span>(<span class="st">&quot;a&quot;</span>, <span class="st">&quot;b&quot;</span>, <span class="st">&quot;c&quot;</span>),</span>
<span id="cb41-3"><a href="base.html#cb41-3"></a>          <span class="kw">c</span>(<span class="st">&quot;a&quot;</span>, <span class="st">&quot;b&quot;</span>, <span class="st">&quot;d&quot;</span>))</span>
<span id="cb41-4"><a href="base.html#cb41-4"></a><span class="co">#&gt; [1] &quot;a&quot; &quot;b&quot;</span></span>
<span id="cb41-5"><a href="base.html#cb41-5"></a><span class="co"># 并集</span></span>
<span id="cb41-6"><a href="base.html#cb41-6"></a><span class="kw">union</span>(<span class="kw">c</span>(<span class="st">&quot;a&quot;</span>, <span class="st">&quot;b&quot;</span>, <span class="st">&quot;c&quot;</span>),</span>
<span id="cb41-7"><a href="base.html#cb41-7"></a>      <span class="kw">c</span>(<span class="st">&quot;a&quot;</span>, <span class="st">&quot;b&quot;</span>, <span class="st">&quot;d&quot;</span>))</span>
<span id="cb41-8"><a href="base.html#cb41-8"></a><span class="co">#&gt; [1] &quot;a&quot; &quot;b&quot; &quot;c&quot; &quot;d&quot;</span></span>
<span id="cb41-9"><a href="base.html#cb41-9"></a><span class="co"># 差集</span></span>
<span id="cb41-10"><a href="base.html#cb41-10"></a><span class="kw">setdiff</span>(<span class="kw">c</span>(<span class="st">&quot;a&quot;</span>, <span class="st">&quot;b&quot;</span>, <span class="st">&quot;c&quot;</span>),</span>
<span id="cb41-11"><a href="base.html#cb41-11"></a>        <span class="kw">c</span>(<span class="st">&quot;a&quot;</span>, <span class="st">&quot;b&quot;</span>, <span class="st">&quot;d&quot;</span>))</span>
<span id="cb41-12"><a href="base.html#cb41-12"></a><span class="co">#&gt; [1] &quot;c&quot;</span></span></code></pre></div>
<p>注意，集合操作同样适用于其他数据类型，读者不妨试一试。</p>
</div>
<div id="因子" class="section level4">
<h4><span class="header-section-number">2.1.1.3</span> 因子</h4>
<p>因子是另类的字符串，它引入了水平信息，更有利于保存和展示分类的文本数据，创建方式如下：</p>
<div class="sourceCode" id="cb42"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb42-1"><a href="base.html#cb42-1"></a>sex &lt;-<span class="st"> </span><span class="kw">factor</span>(<span class="kw">c</span>(<span class="st">&quot;Male&quot;</span>, <span class="st">&quot;Female&quot;</span>, <span class="st">&quot;Female&quot;</span>, <span class="st">&quot;Male&quot;</span>, <span class="st">&quot;Male&quot;</span>))</span>
<span id="cb42-2"><a href="base.html#cb42-2"></a>sex</span>
<span id="cb42-3"><a href="base.html#cb42-3"></a><span class="co">#&gt; [1] Male   Female Female Male   Male  </span></span>
<span id="cb42-4"><a href="base.html#cb42-4"></a><span class="co">#&gt; Levels: Female Male</span></span></code></pre></div>
<p>上述结果除了打印向量本身的元素，还输出了变量 <code>sex</code> 的水平信息。水平信息可以通过 <code>levels()</code> 函数获取。</p>
<div class="sourceCode" id="cb43"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb43-1"><a href="base.html#cb43-1"></a><span class="kw">levels</span>(sex)</span>
<span id="cb43-2"><a href="base.html#cb43-2"></a><span class="co">#&gt; [1] &quot;Female&quot; &quot;Male&quot;</span></span></code></pre></div>
<p>重命名因子水平，可以完成对应所有元素的修改：</p>
<div class="sourceCode" id="cb44"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb44-1"><a href="base.html#cb44-1"></a><span class="kw">levels</span>(sex) &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;Female&quot;</span>, <span class="st">&quot;男的&quot;</span>)</span>
<span id="cb44-2"><a href="base.html#cb44-2"></a>sex</span>
<span id="cb44-3"><a href="base.html#cb44-3"></a><span class="co">#&gt; [1] 男的   Female Female 男的   男的  </span></span>
<span id="cb44-4"><a href="base.html#cb44-4"></a><span class="co">#&gt; Levels: Female 男的</span></span></code></pre></div>
<p>水平可以在创建因子时指定，如果一些分类没有对应的水平，将被转换为 <code>NA</code>（Not Available 的缩写），<code>NA</code> 是 R 中比较特殊的一个值，表示数据未知的状态。</p>
<div class="sourceCode" id="cb45"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb45-1"><a href="base.html#cb45-1"></a><span class="kw">factor</span>(<span class="kw">c</span>(<span class="st">&quot;Male&quot;</span>, <span class="st">&quot;Female&quot;</span>, <span class="st">&quot;Female&quot;</span>, <span class="st">&quot;Male&quot;</span>, <span class="st">&quot;Male&quot;</span>, <span class="st">&quot;M&quot;</span>, <span class="st">&quot;M&quot;</span>), <span class="dt">levels =</span> <span class="kw">c</span>(<span class="st">&quot;Male&quot;</span>, <span class="st">&quot;Female&quot;</span>))</span>
<span id="cb45-2"><a href="base.html#cb45-2"></a><span class="co">#&gt; [1] Male   Female Female Male   Male   &lt;NA&gt;   &lt;NA&gt;  </span></span>
<span id="cb45-3"><a href="base.html#cb45-3"></a><span class="co">#&gt; Levels: Male Female</span></span></code></pre></div>
<p>除了水平，我们还可以为分类添加标签以展示某一分类对应的具体信息：</p>
<div class="sourceCode" id="cb46"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb46-1"><a href="base.html#cb46-1"></a><span class="kw">factor</span>(<span class="kw">c</span>(<span class="st">&quot;Male&quot;</span>, <span class="st">&quot;Female&quot;</span>, <span class="st">&quot;Female&quot;</span>, <span class="st">&quot;Male&quot;</span>, <span class="st">&quot;Male&quot;</span>, <span class="st">&quot;M&quot;</span>, <span class="st">&quot;M&quot;</span>), </span>
<span id="cb46-2"><a href="base.html#cb46-2"></a>       <span class="dt">levels =</span> <span class="kw">c</span>(<span class="st">&quot;Male&quot;</span>, <span class="st">&quot;Female&quot;</span>),</span>
<span id="cb46-3"><a href="base.html#cb46-3"></a>       <span class="dt">labels =</span> <span class="kw">c</span>(<span class="st">&quot;性别：男&quot;</span>, <span class="st">&quot;性别：女&quot;</span>))</span>
<span id="cb46-4"><a href="base.html#cb46-4"></a><span class="co">#&gt; [1] 性别：男 性别：女 性别：女 性别：男 性别：男</span></span>
<span id="cb46-5"><a href="base.html#cb46-5"></a><span class="co">#&gt; [6] &lt;NA&gt;     &lt;NA&gt;    </span></span>
<span id="cb46-6"><a href="base.html#cb46-6"></a><span class="co">#&gt; Levels: 性别：男 性别：女</span></span></code></pre></div>
<p>初学者需要额外注意，R 代码不支持中文，中文以及特殊字符只能出现在字符串中，两者的换用是代码出错的常见原因。</p>
</div>
<div id="逻辑值" class="section level4">
<h4><span class="header-section-number">2.1.1.4</span> 逻辑值</h4>
<p>逻辑值仅有 2 个：<code>TRUE</code> 和 <code>FALSE</code>，对应缩写为 <code>T</code> 和 <code>F</code>。一般并不会直接使用逻辑值存储信息，而是使用它管理程序的逻辑，这一点在本章【控制结构与循环】一节中介绍。</p>
<p>逻辑值的另外一个重要作用是对数据取子集，相比于整数索引，它更加的高效。</p>
<p>我们先看一下如何利用整数索引提取子集，如提取变量 <code>heights</code> 的第 2 个元素：</p>
<div class="sourceCode" id="cb47"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb47-1"><a href="base.html#cb47-1"></a>heights[<span class="dv">2</span>]</span>
<span id="cb47-2"><a href="base.html#cb47-2"></a><span class="co">#&gt; [1] 1.72</span></span></code></pre></div>
<p>再提取第 2 到第 5 个元素，这会形成新的向量：</p>
<div class="sourceCode" id="cb48"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb48-1"><a href="base.html#cb48-1"></a>heights[<span class="dv">2</span><span class="op">:</span><span class="dv">5</span>]</span>
<span id="cb48-2"><a href="base.html#cb48-2"></a><span class="co">#&gt; [1] 1.72 1.80 1.66 1.65</span></span></code></pre></div>
<p>这里 <code>2:5</code> 是一个便捷操作，它生成了整数向量 <code>c(2, 3, 4, 5)</code>：</p>
<div class="sourceCode" id="cb49"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb49-1"><a href="base.html#cb49-1"></a><span class="dv">2</span><span class="op">:</span><span class="dv">5</span></span>
<span id="cb49-2"><a href="base.html#cb49-2"></a><span class="co">#&gt; [1] 2 3 4 5</span></span></code></pre></div>
<p>如果使用负号，将会去掉对应的元素：</p>
<div class="sourceCode" id="cb50"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb50-1"><a href="base.html#cb50-1"></a>heights</span>
<span id="cb50-2"><a href="base.html#cb50-2"></a><span class="co">#&gt; [1] 1.70 1.72 1.80 1.66 1.65 1.88</span></span>
<span id="cb50-3"><a href="base.html#cb50-3"></a>heights[<span class="op">-</span><span class="dv">2</span>]</span>
<span id="cb50-4"><a href="base.html#cb50-4"></a><span class="co">#&gt; [1] 1.70 1.80 1.66 1.65 1.88</span></span></code></pre></div>
<p>在实际工作中，需要提取的数据子集通常不会这么有序，因此需要借助比较运算符和 <code>which()</code> 函数获取子集数据的索引。</p>
<p>例如，找出身高大于 1.7 的数据：</p>
<div class="sourceCode" id="cb51"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb51-1"><a href="base.html#cb51-1"></a><span class="co"># 先使用 which() 找出索引</span></span>
<span id="cb51-2"><a href="base.html#cb51-2"></a><span class="kw">which</span>(heights <span class="op">&gt;</span><span class="st"> </span><span class="fl">1.7</span>)</span>
<span id="cb51-3"><a href="base.html#cb51-3"></a><span class="co">#&gt; [1] 2 3 6</span></span>
<span id="cb51-4"><a href="base.html#cb51-4"></a></span>
<span id="cb51-5"><a href="base.html#cb51-5"></a><span class="co"># 然后组合取子集操作提取子集数据</span></span>
<span id="cb51-6"><a href="base.html#cb51-6"></a>heights[<span class="kw">which</span>(heights <span class="op">&gt;</span><span class="st"> </span><span class="fl">1.7</span>)]</span>
<span id="cb51-7"><a href="base.html#cb51-7"></a><span class="co">#&gt; [1] 1.72 1.80 1.88</span></span></code></pre></div>
<p>实际上，我们完全没有必要引入 <code>which()</code> 函数用来返回数据的整数索引，<code>heights &gt; 1.7</code> 比较的结果是一个逻辑值向量，它本身就可以作为索引用于提取子集。</p>
<div class="sourceCode" id="cb52"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb52-1"><a href="base.html#cb52-1"></a>heights <span class="op">&gt;</span><span class="st"> </span><span class="fl">1.7</span></span>
<span id="cb52-2"><a href="base.html#cb52-2"></a><span class="co">#&gt; [1] FALSE  TRUE  TRUE FALSE FALSE  TRUE</span></span>
<span id="cb52-3"><a href="base.html#cb52-3"></a>heights[heights <span class="op">&gt;</span><span class="st"> </span><span class="fl">1.7</span>]</span>
<span id="cb52-4"><a href="base.html#cb52-4"></a><span class="co">#&gt; [1] 1.72 1.80 1.88</span></span></code></pre></div>
<p><code>TRUE</code> 对应的元素被保留，而 <code>FALSE</code> 对应的元素被去除。请读者记住，逻辑索引是首选的取子集方式，它更加高效。</p>
</div>
<div id="深入向量" class="section level4">
<h4><span class="header-section-number">2.1.1.5</span> 深入向量</h4>
<p>向量除了保存数据，还可以保存与之相关的属性。例如，为了更好展示 <code>heights</code> 信息，我们可以增加名字属性。</p>
<div class="sourceCode" id="cb53"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb53-1"><a href="base.html#cb53-1"></a><span class="kw">names</span>(heights) &lt;-<span class="st"> </span><span class="kw">paste</span>(<span class="st">&quot;Student:&quot;</span>, <span class="dv">1</span><span class="op">:</span><span class="dv">6</span>)</span>
<span id="cb53-2"><a href="base.html#cb53-2"></a>heights</span>
<span id="cb53-3"><a href="base.html#cb53-3"></a><span class="co">#&gt; Student: 1 Student: 2 Student: 3 Student: 4 Student: 5 </span></span>
<span id="cb53-4"><a href="base.html#cb53-4"></a><span class="co">#&gt;       1.70       1.72       1.80       1.66       1.65 </span></span>
<span id="cb53-5"><a href="base.html#cb53-5"></a><span class="co">#&gt; Student: 6 </span></span>
<span id="cb53-6"><a href="base.html#cb53-6"></a><span class="co">#&gt;       1.88</span></span></code></pre></div>
<p>上述代码中，<code>paste()</code> 将两个向量粘贴到一起，默认中间存在空格。</p>
<div class="sourceCode" id="cb54"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb54-1"><a href="base.html#cb54-1"></a><span class="kw">paste</span>(<span class="st">&quot;Student:&quot;</span>, <span class="dv">1</span><span class="op">:</span><span class="dv">6</span>)</span>
<span id="cb54-2"><a href="base.html#cb54-2"></a><span class="co">#&gt; [1] &quot;Student: 1&quot; &quot;Student: 2&quot; &quot;Student: 3&quot; &quot;Student: 4&quot;</span></span>
<span id="cb54-3"><a href="base.html#cb54-3"></a><span class="co">#&gt; [5] &quot;Student: 5&quot; &quot;Student: 6&quot;</span></span>
<span id="cb54-4"><a href="base.html#cb54-4"></a><span class="co"># 修改分隔符</span></span>
<span id="cb54-5"><a href="base.html#cb54-5"></a><span class="kw">paste</span>(<span class="st">&quot;Student&quot;</span>, <span class="dv">1</span><span class="op">:</span><span class="dv">6</span>, <span class="dt">sep =</span> <span class="st">&quot;-&quot;</span>)</span>
<span id="cb54-6"><a href="base.html#cb54-6"></a><span class="co">#&gt; [1] &quot;Student-1&quot; &quot;Student-2&quot; &quot;Student-3&quot; &quot;Student-4&quot;</span></span>
<span id="cb54-7"><a href="base.html#cb54-7"></a><span class="co">#&gt; [5] &quot;Student-5&quot; &quot;Student-6&quot;</span></span></code></pre></div>
<p><code>names()</code> 函数不仅可以设定名字属性，还可以查看：</p>
<div class="sourceCode" id="cb55"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb55-1"><a href="base.html#cb55-1"></a><span class="kw">names</span>(heights)</span>
<span id="cb55-2"><a href="base.html#cb55-2"></a><span class="co">#&gt; [1] &quot;Student: 1&quot; &quot;Student: 2&quot; &quot;Student: 3&quot; &quot;Student: 4&quot;</span></span>
<span id="cb55-3"><a href="base.html#cb55-3"></a><span class="co">#&gt; [5] &quot;Student: 5&quot; &quot;Student: 6&quot;</span></span></code></pre></div>
<p>R 中很多函数都与 <code>names()</code> 类似，不仅可以用于修改，同时还可以用于获取对应的信息。</p>
<p>另外，R 对象所具有的属性可以通过 <code>attributes()</code> 函数查看：</p>
<div class="sourceCode" id="cb56"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb56-1"><a href="base.html#cb56-1"></a><span class="kw">attributes</span>(heights)</span>
<span id="cb56-2"><a href="base.html#cb56-2"></a><span class="co">#&gt; $names</span></span>
<span id="cb56-3"><a href="base.html#cb56-3"></a><span class="co">#&gt; [1] &quot;Student: 1&quot; &quot;Student: 2&quot; &quot;Student: 3&quot; &quot;Student: 4&quot;</span></span>
<span id="cb56-4"><a href="base.html#cb56-4"></a><span class="co">#&gt; [5] &quot;Student: 5&quot; &quot;Student: 6&quot;</span></span></code></pre></div>
<p>R 默认的类系统非常自由，我们可以任意设定属性，如增加一个班级属性：</p>
<div class="sourceCode" id="cb57"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb57-1"><a href="base.html#cb57-1"></a><span class="kw">attr</span>(heights, <span class="st">&quot;class-name&quot;</span>) &lt;-<span class="st"> &quot;A&quot;</span></span>
<span id="cb57-2"><a href="base.html#cb57-2"></a><span class="kw">attr</span>(heights, <span class="st">&quot;class-name&quot;</span>)</span>
<span id="cb57-3"><a href="base.html#cb57-3"></a><span class="co">#&gt; [1] &quot;A&quot;</span></span>
<span id="cb57-4"><a href="base.html#cb57-4"></a></span>
<span id="cb57-5"><a href="base.html#cb57-5"></a><span class="kw">attributes</span>(heights)</span>
<span id="cb57-6"><a href="base.html#cb57-6"></a><span class="co">#&gt; $names</span></span>
<span id="cb57-7"><a href="base.html#cb57-7"></a><span class="co">#&gt; [1] &quot;Student: 1&quot; &quot;Student: 2&quot; &quot;Student: 3&quot; &quot;Student: 4&quot;</span></span>
<span id="cb57-8"><a href="base.html#cb57-8"></a><span class="co">#&gt; [5] &quot;Student: 5&quot; &quot;Student: 6&quot;</span></span>
<span id="cb57-9"><a href="base.html#cb57-9"></a><span class="co">#&gt; </span></span>
<span id="cb57-10"><a href="base.html#cb57-10"></a><span class="co">#&gt; $`class-name`</span></span>
<span id="cb57-11"><a href="base.html#cb57-11"></a><span class="co">#&gt; [1] &quot;A&quot;</span></span></code></pre></div>
<p>在创建向量时，一些函数会相当有用，如 <code>rep()</code>，它可以用来重复数据。</p>
<div class="sourceCode" id="cb58"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb58-1"><a href="base.html#cb58-1"></a><span class="kw">rep</span>(<span class="dv">1</span><span class="op">:</span><span class="dv">4</span>, <span class="dv">3</span>)</span>
<span id="cb58-2"><a href="base.html#cb58-2"></a><span class="co">#&gt;  [1] 1 2 3 4 1 2 3 4 1 2 3 4</span></span>
<span id="cb58-3"><a href="base.html#cb58-3"></a><span class="kw">rep</span>(<span class="dv">1</span><span class="op">:</span><span class="dv">4</span>, <span class="dt">each =</span> <span class="dv">3</span>)</span>
<span id="cb58-4"><a href="base.html#cb58-4"></a><span class="co">#&gt;  [1] 1 1 1 2 2 2 3 3 3 4 4 4</span></span></code></pre></div>
<p>读者如果想要更新部分向量值，直接对提取的子集重新赋值即可。</p>
<div class="sourceCode" id="cb59"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb59-1"><a href="base.html#cb59-1"></a>heights2</span>
<span id="cb59-2"><a href="base.html#cb59-2"></a><span class="co">#&gt; [1] 1.70 1.72 1.80 1.66 1.65 1.88</span></span>
<span id="cb59-3"><a href="base.html#cb59-3"></a>heights2[heights2 <span class="op">&gt;</span><span class="st"> </span><span class="fl">1.8</span>] &lt;-<span class="st"> </span><span class="fl">1.8</span></span>
<span id="cb59-4"><a href="base.html#cb59-4"></a>heights2</span>
<span id="cb59-5"><a href="base.html#cb59-5"></a><span class="co">#&gt; [1] 1.70 1.72 1.80 1.66 1.65 1.80</span></span></code></pre></div>
</div>
</div>
<div id="数组与矩阵" class="section level3">
<h3><span class="header-section-number">2.1.2</span> 数组与矩阵</h3>
<p>我们前面看的的向量都是一个维度的，如果我们增加维度信息，将形成数组。2 维的数组比较常用，被称为矩阵。</p>
<p>创建一个 2x2x3 的数组：</p>
<div class="sourceCode" id="cb60"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb60-1"><a href="base.html#cb60-1"></a><span class="kw">array</span>(<span class="dv">1</span><span class="op">:</span><span class="dv">12</span>, <span class="dt">dim =</span> <span class="kw">c</span>(<span class="dv">2</span>, <span class="dv">2</span>, <span class="dv">3</span>))</span>
<span id="cb60-2"><a href="base.html#cb60-2"></a><span class="co">#&gt; , , 1</span></span>
<span id="cb60-3"><a href="base.html#cb60-3"></a><span class="co">#&gt; </span></span>
<span id="cb60-4"><a href="base.html#cb60-4"></a><span class="co">#&gt;      [,1] [,2]</span></span>
<span id="cb60-5"><a href="base.html#cb60-5"></a><span class="co">#&gt; [1,]    1    3</span></span>
<span id="cb60-6"><a href="base.html#cb60-6"></a><span class="co">#&gt; [2,]    2    4</span></span>
<span id="cb60-7"><a href="base.html#cb60-7"></a><span class="co">#&gt; </span></span>
<span id="cb60-8"><a href="base.html#cb60-8"></a><span class="co">#&gt; , , 2</span></span>
<span id="cb60-9"><a href="base.html#cb60-9"></a><span class="co">#&gt; </span></span>
<span id="cb60-10"><a href="base.html#cb60-10"></a><span class="co">#&gt;      [,1] [,2]</span></span>
<span id="cb60-11"><a href="base.html#cb60-11"></a><span class="co">#&gt; [1,]    5    7</span></span>
<span id="cb60-12"><a href="base.html#cb60-12"></a><span class="co">#&gt; [2,]    6    8</span></span>
<span id="cb60-13"><a href="base.html#cb60-13"></a><span class="co">#&gt; </span></span>
<span id="cb60-14"><a href="base.html#cb60-14"></a><span class="co">#&gt; , , 3</span></span>
<span id="cb60-15"><a href="base.html#cb60-15"></a><span class="co">#&gt; </span></span>
<span id="cb60-16"><a href="base.html#cb60-16"></a><span class="co">#&gt;      [,1] [,2]</span></span>
<span id="cb60-17"><a href="base.html#cb60-17"></a><span class="co">#&gt; [1,]    9   11</span></span>
<span id="cb60-18"><a href="base.html#cb60-18"></a><span class="co">#&gt; [2,]   10   12</span></span></code></pre></div>
<p>创建一个 4x3 的矩阵：</p>
<div class="sourceCode" id="cb61"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb61-1"><a href="base.html#cb61-1"></a><span class="kw">matrix</span>(<span class="dv">1</span><span class="op">:</span><span class="dv">12</span>, <span class="dt">nrow =</span> <span class="dv">4</span>, <span class="dt">ncol =</span> <span class="dv">3</span>, <span class="dt">byrow =</span> <span class="ot">TRUE</span>)</span>
<span id="cb61-2"><a href="base.html#cb61-2"></a><span class="co">#&gt;      [,1] [,2] [,3]</span></span>
<span id="cb61-3"><a href="base.html#cb61-3"></a><span class="co">#&gt; [1,]    1    2    3</span></span>
<span id="cb61-4"><a href="base.html#cb61-4"></a><span class="co">#&gt; [2,]    4    5    6</span></span>
<span id="cb61-5"><a href="base.html#cb61-5"></a><span class="co">#&gt; [3,]    7    8    9</span></span>
<span id="cb61-6"><a href="base.html#cb61-6"></a><span class="co">#&gt; [4,]   10   11   12</span></span></code></pre></div>
<p>矩阵包含 2 个常用的属性，行名 <code>rownames</code> 和列名 <code>colnames</code>：</p>
<div class="sourceCode" id="cb62"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb62-1"><a href="base.html#cb62-1"></a>M &lt;-<span class="st"> </span><span class="kw">matrix</span>(<span class="dv">1</span><span class="op">:</span><span class="dv">12</span>, <span class="dt">nrow =</span> <span class="dv">4</span>, <span class="dt">ncol =</span> <span class="dv">3</span>, <span class="dt">byrow =</span> <span class="ot">TRUE</span>)</span>
<span id="cb62-2"><a href="base.html#cb62-2"></a><span class="kw">rownames</span>(M)</span>
<span id="cb62-3"><a href="base.html#cb62-3"></a><span class="co">#&gt; NULL</span></span>
<span id="cb62-4"><a href="base.html#cb62-4"></a><span class="kw">colnames</span>(M)</span>
<span id="cb62-5"><a href="base.html#cb62-5"></a><span class="co">#&gt; NULL</span></span></code></pre></div>
<p>上述创建矩阵时我们没有设定，所以默认是 <code>NULL</code>（空值）。我们可以自行设定：</p>
<div class="sourceCode" id="cb63"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb63-1"><a href="base.html#cb63-1"></a><span class="kw">rownames</span>(M) &lt;-<span class="st"> </span><span class="kw">paste0</span>(<span class="st">&quot;a&quot;</span>, <span class="dv">1</span><span class="op">:</span><span class="dv">4</span>)</span>
<span id="cb63-2"><a href="base.html#cb63-2"></a><span class="kw">colnames</span>(M) &lt;-<span class="st"> </span><span class="kw">paste0</span>(<span class="st">&quot;b&quot;</span>, <span class="dv">1</span><span class="op">:</span><span class="dv">3</span>)</span>
<span id="cb63-3"><a href="base.html#cb63-3"></a>M</span>
<span id="cb63-4"><a href="base.html#cb63-4"></a><span class="co">#&gt;    b1 b2 b3</span></span>
<span id="cb63-5"><a href="base.html#cb63-5"></a><span class="co">#&gt; a1  1  2  3</span></span>
<span id="cb63-6"><a href="base.html#cb63-6"></a><span class="co">#&gt; a2  4  5  6</span></span>
<span id="cb63-7"><a href="base.html#cb63-7"></a><span class="co">#&gt; a3  7  8  9</span></span>
<span id="cb63-8"><a href="base.html#cb63-8"></a><span class="co">#&gt; a4 10 11 12</span></span></code></pre></div>
<p>还可以获取矩阵的维度信息：</p>
<div class="sourceCode" id="cb64"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb64-1"><a href="base.html#cb64-1"></a><span class="kw">dim</span>(M)</span>
<span id="cb64-2"><a href="base.html#cb64-2"></a><span class="co">#&gt; [1] 4 3</span></span>
<span id="cb64-3"><a href="base.html#cb64-3"></a><span class="co"># 行数</span></span>
<span id="cb64-4"><a href="base.html#cb64-4"></a><span class="kw">nrow</span>(M)</span>
<span id="cb64-5"><a href="base.html#cb64-5"></a><span class="co">#&gt; [1] 4</span></span>
<span id="cb64-6"><a href="base.html#cb64-6"></a><span class="co"># 列数</span></span>
<span id="cb64-7"><a href="base.html#cb64-7"></a><span class="kw">ncol</span>(M)</span>
<span id="cb64-8"><a href="base.html#cb64-8"></a><span class="co">#&gt; [1] 3</span></span></code></pre></div>
<p>针对数值矩阵，一些运算非常有用：</p>
<div class="sourceCode" id="cb65"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb65-1"><a href="base.html#cb65-1"></a><span class="co"># 矩阵和</span></span>
<span id="cb65-2"><a href="base.html#cb65-2"></a><span class="kw">sum</span>(M)</span>
<span id="cb65-3"><a href="base.html#cb65-3"></a><span class="co">#&gt; [1] 78</span></span>
<span id="cb65-4"><a href="base.html#cb65-4"></a><span class="co"># 矩阵均值</span></span>
<span id="cb65-5"><a href="base.html#cb65-5"></a><span class="kw">mean</span>(M)</span>
<span id="cb65-6"><a href="base.html#cb65-6"></a><span class="co">#&gt; [1] 6.5</span></span>
<span id="cb65-7"><a href="base.html#cb65-7"></a><span class="co"># 行和</span></span>
<span id="cb65-8"><a href="base.html#cb65-8"></a><span class="kw">rowSums</span>(M)</span>
<span id="cb65-9"><a href="base.html#cb65-9"></a><span class="co">#&gt; a1 a2 a3 a4 </span></span>
<span id="cb65-10"><a href="base.html#cb65-10"></a><span class="co">#&gt;  6 15 24 33</span></span>
<span id="cb65-11"><a href="base.html#cb65-11"></a><span class="co"># 列和</span></span>
<span id="cb65-12"><a href="base.html#cb65-12"></a><span class="kw">colSums</span>(M)</span>
<span id="cb65-13"><a href="base.html#cb65-13"></a><span class="co">#&gt; b1 b2 b3 </span></span>
<span id="cb65-14"><a href="base.html#cb65-14"></a><span class="co">#&gt; 22 26 30</span></span>
<span id="cb65-15"><a href="base.html#cb65-15"></a><span class="co"># 行均值</span></span>
<span id="cb65-16"><a href="base.html#cb65-16"></a><span class="kw">rowMeans</span>(M)</span>
<span id="cb65-17"><a href="base.html#cb65-17"></a><span class="co">#&gt; a1 a2 a3 a4 </span></span>
<span id="cb65-18"><a href="base.html#cb65-18"></a><span class="co">#&gt;  2  5  8 11</span></span>
<span id="cb65-19"><a href="base.html#cb65-19"></a><span class="co"># 列均值</span></span>
<span id="cb65-20"><a href="base.html#cb65-20"></a><span class="kw">colMeans</span>(M)</span>
<span id="cb65-21"><a href="base.html#cb65-21"></a><span class="co">#&gt;  b1  b2  b3 </span></span>
<span id="cb65-22"><a href="base.html#cb65-22"></a><span class="co">#&gt; 5.5 6.5 7.5</span></span></code></pre></div>
<p>取子集操作依旧是适用的，逗号 <code>,</code> 用于分割不同的维度：</p>
<div class="sourceCode" id="cb66"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb66-1"><a href="base.html#cb66-1"></a><span class="co"># 第 1 行第 1 列的元素</span></span>
<span id="cb66-2"><a href="base.html#cb66-2"></a>M[<span class="dv">1</span>, <span class="dv">1</span>]</span>
<span id="cb66-3"><a href="base.html#cb66-3"></a><span class="co">#&gt; [1] 1</span></span>
<span id="cb66-4"><a href="base.html#cb66-4"></a><span class="co"># 第 1 行</span></span>
<span id="cb66-5"><a href="base.html#cb66-5"></a>M[<span class="dv">1</span>, ]</span>
<span id="cb66-6"><a href="base.html#cb66-6"></a><span class="co">#&gt; b1 b2 b3 </span></span>
<span id="cb66-7"><a href="base.html#cb66-7"></a><span class="co">#&gt;  1  2  3</span></span>
<span id="cb66-8"><a href="base.html#cb66-8"></a><span class="co"># 第 1 列</span></span>
<span id="cb66-9"><a href="base.html#cb66-9"></a>M[, <span class="dv">1</span>]</span>
<span id="cb66-10"><a href="base.html#cb66-10"></a><span class="co">#&gt; a1 a2 a3 a4 </span></span>
<span id="cb66-11"><a href="base.html#cb66-11"></a><span class="co">#&gt;  1  4  7 10</span></span>
<span id="cb66-12"><a href="base.html#cb66-12"></a><span class="co"># 前 2 行</span></span>
<span id="cb66-13"><a href="base.html#cb66-13"></a>M[<span class="dv">1</span><span class="op">:</span><span class="dv">2</span>, ]</span>
<span id="cb66-14"><a href="base.html#cb66-14"></a><span class="co">#&gt;    b1 b2 b3</span></span>
<span id="cb66-15"><a href="base.html#cb66-15"></a><span class="co">#&gt; a1  1  2  3</span></span>
<span id="cb66-16"><a href="base.html#cb66-16"></a><span class="co">#&gt; a2  4  5  6</span></span>
<span id="cb66-17"><a href="base.html#cb66-17"></a><span class="co"># 前 2 列</span></span>
<span id="cb66-18"><a href="base.html#cb66-18"></a>M[, <span class="dv">1</span><span class="op">:</span><span class="dv">2</span>]</span>
<span id="cb66-19"><a href="base.html#cb66-19"></a><span class="co">#&gt;    b1 b2</span></span>
<span id="cb66-20"><a href="base.html#cb66-20"></a><span class="co">#&gt; a1  1  2</span></span>
<span id="cb66-21"><a href="base.html#cb66-21"></a><span class="co">#&gt; a2  4  5</span></span>
<span id="cb66-22"><a href="base.html#cb66-22"></a><span class="co">#&gt; a3  7  8</span></span>
<span id="cb66-23"><a href="base.html#cb66-23"></a><span class="co">#&gt; a4 10 11</span></span></code></pre></div>
<p>当取单行时，由于维度信息默认丢失，返回的是一维向量，我们可以显式指定保留矩阵形式，如：</p>
<div class="sourceCode" id="cb67"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb67-1"><a href="base.html#cb67-1"></a>M[, <span class="dv">1</span>, drop =<span class="st"> </span><span class="ot">FALSE</span>]</span>
<span id="cb67-2"><a href="base.html#cb67-2"></a><span class="co">#&gt;    b1</span></span>
<span id="cb67-3"><a href="base.html#cb67-3"></a><span class="co">#&gt; a1  1</span></span>
<span id="cb67-4"><a href="base.html#cb67-4"></a><span class="co">#&gt; a2  4</span></span>
<span id="cb67-5"><a href="base.html#cb67-5"></a><span class="co">#&gt; a3  7</span></span>
<span id="cb67-6"><a href="base.html#cb67-6"></a><span class="co">#&gt; a4 10</span></span></code></pre></div>
<p>逻辑索引也可以使用：</p>
<div class="sourceCode" id="cb68"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb68-1"><a href="base.html#cb68-1"></a>M[<span class="kw">rowMeans</span>(M) <span class="op">&gt;</span><span class="st"> </span><span class="dv">5</span>, ]</span>
<span id="cb68-2"><a href="base.html#cb68-2"></a><span class="co">#&gt;    b1 b2 b3</span></span>
<span id="cb68-3"><a href="base.html#cb68-3"></a><span class="co">#&gt; a3  7  8  9</span></span>
<span id="cb68-4"><a href="base.html#cb68-4"></a><span class="co">#&gt; a4 10 11 12</span></span></code></pre></div>
</div>
<div id="数据框" class="section level3">
<h3><span class="header-section-number">2.1.3</span> 数据框</h3>
<p>数据框（<code>data.frame</code>）是 R 中非常独特的一种数据结构，它可以非常好存储和展示常见的表格数据。从外形上看，它与矩阵非常相似，但与矩阵不同的是，数据框的列可以是不同的数据类型。</p>
<p>例如，创建一个数据框存储性别，年龄和身高数据。</p>
<div class="sourceCode" id="cb69"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb69-1"><a href="base.html#cb69-1"></a>df &lt;-<span class="st"> </span><span class="kw">data.frame</span>(</span>
<span id="cb69-2"><a href="base.html#cb69-2"></a>  <span class="dt">sex =</span> <span class="kw">c</span>(<span class="st">&quot;F&quot;</span>, <span class="st">&quot;M&quot;</span>, <span class="st">&quot;M&quot;</span>, <span class="st">&quot;F&quot;</span>),</span>
<span id="cb69-3"><a href="base.html#cb69-3"></a>  <span class="dt">age =</span> <span class="kw">c</span>(<span class="dv">17</span>, <span class="dv">29</span>, <span class="dv">20</span>, <span class="dv">33</span>),</span>
<span id="cb69-4"><a href="base.html#cb69-4"></a>  <span class="dt">heights =</span> <span class="kw">c</span>(<span class="fl">1.66</span>, <span class="fl">1.84</span>, <span class="fl">1.83</span>, <span class="fl">1.56</span>)</span>
<span id="cb69-5"><a href="base.html#cb69-5"></a>)</span>
<span id="cb69-6"><a href="base.html#cb69-6"></a></span>
<span id="cb69-7"><a href="base.html#cb69-7"></a>df</span>
<span id="cb69-8"><a href="base.html#cb69-8"></a><span class="co">#&gt;   sex age heights</span></span>
<span id="cb69-9"><a href="base.html#cb69-9"></a><span class="co">#&gt; 1   F  17    1.66</span></span>
<span id="cb69-10"><a href="base.html#cb69-10"></a><span class="co">#&gt; 2   M  29    1.84</span></span>
<span id="cb69-11"><a href="base.html#cb69-11"></a><span class="co">#&gt; 3   M  20    1.83</span></span>
<span id="cb69-12"><a href="base.html#cb69-12"></a><span class="co">#&gt; 4   F  33    1.56</span></span></code></pre></div>
<p><code>str()</code> 非常方便用于观察复杂数据类型的结构：</p>
<div class="sourceCode" id="cb70"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb70-1"><a href="base.html#cb70-1"></a><span class="kw">str</span>(df)</span>
<span id="cb70-2"><a href="base.html#cb70-2"></a><span class="co">#&gt; &#39;data.frame&#39;:    4 obs. of  3 variables:</span></span>
<span id="cb70-3"><a href="base.html#cb70-3"></a><span class="co">#&gt;  $ sex    : Factor w/ 2 levels &quot;F&quot;,&quot;M&quot;: 1 2 2 1</span></span>
<span id="cb70-4"><a href="base.html#cb70-4"></a><span class="co">#&gt;  $ age    : num  17 29 20 33</span></span>
<span id="cb70-5"><a href="base.html#cb70-5"></a><span class="co">#&gt;  $ heights: num  1.66 1.84 1.83 1.56</span></span></code></pre></div>
<p>默认，数据框中字符列会被自动转换为因子类型，我们可以通过设定修改它。</p>
<div class="sourceCode" id="cb71"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb71-1"><a href="base.html#cb71-1"></a>df &lt;-<span class="st"> </span><span class="kw">data.frame</span>(</span>
<span id="cb71-2"><a href="base.html#cb71-2"></a>  <span class="dt">sex =</span> <span class="kw">c</span>(<span class="st">&quot;F&quot;</span>, <span class="st">&quot;M&quot;</span>, <span class="st">&quot;M&quot;</span>, <span class="st">&quot;F&quot;</span>),</span>
<span id="cb71-3"><a href="base.html#cb71-3"></a>  <span class="dt">age =</span> <span class="kw">c</span>(<span class="dv">17</span>, <span class="dv">29</span>, <span class="dv">20</span>, <span class="dv">33</span>),</span>
<span id="cb71-4"><a href="base.html#cb71-4"></a>  <span class="dt">heights =</span> <span class="kw">c</span>(<span class="fl">1.66</span>, <span class="fl">1.84</span>, <span class="fl">1.83</span>, <span class="fl">1.56</span>),</span>
<span id="cb71-5"><a href="base.html#cb71-5"></a>  <span class="dt">stringsAsFactors =</span> <span class="ot">FALSE</span></span>
<span id="cb71-6"><a href="base.html#cb71-6"></a>)</span>
<span id="cb71-7"><a href="base.html#cb71-7"></a></span>
<span id="cb71-8"><a href="base.html#cb71-8"></a>df</span>
<span id="cb71-9"><a href="base.html#cb71-9"></a><span class="co">#&gt;   sex age heights</span></span>
<span id="cb71-10"><a href="base.html#cb71-10"></a><span class="co">#&gt; 1   F  17    1.66</span></span>
<span id="cb71-11"><a href="base.html#cb71-11"></a><span class="co">#&gt; 2   M  29    1.84</span></span>
<span id="cb71-12"><a href="base.html#cb71-12"></a><span class="co">#&gt; 3   M  20    1.83</span></span>
<span id="cb71-13"><a href="base.html#cb71-13"></a><span class="co">#&gt; 4   F  33    1.56</span></span>
<span id="cb71-14"><a href="base.html#cb71-14"></a></span>
<span id="cb71-15"><a href="base.html#cb71-15"></a><span class="kw">str</span>(df)</span>
<span id="cb71-16"><a href="base.html#cb71-16"></a><span class="co">#&gt; &#39;data.frame&#39;:    4 obs. of  3 variables:</span></span>
<span id="cb71-17"><a href="base.html#cb71-17"></a><span class="co">#&gt;  $ sex    : chr  &quot;F&quot; &quot;M&quot; &quot;M&quot; &quot;F&quot;</span></span>
<span id="cb71-18"><a href="base.html#cb71-18"></a><span class="co">#&gt;  $ age    : num  17 29 20 33</span></span>
<span id="cb71-19"><a href="base.html#cb71-19"></a><span class="co">#&gt;  $ heights: num  1.66 1.84 1.83 1.56</span></span></code></pre></div>
<p>很多适用于矩阵的操作同样适用于数据框。</p>
<p>例如，获取维度信息：</p>
<div class="sourceCode" id="cb72"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb72-1"><a href="base.html#cb72-1"></a><span class="kw">dim</span>(df)</span>
<span id="cb72-2"><a href="base.html#cb72-2"></a><span class="co">#&gt; [1] 4 3</span></span>
<span id="cb72-3"><a href="base.html#cb72-3"></a><span class="co"># 行数</span></span>
<span id="cb72-4"><a href="base.html#cb72-4"></a><span class="kw">nrow</span>(df)</span>
<span id="cb72-5"><a href="base.html#cb72-5"></a><span class="co">#&gt; [1] 4</span></span>
<span id="cb72-6"><a href="base.html#cb72-6"></a><span class="co"># 列数</span></span>
<span id="cb72-7"><a href="base.html#cb72-7"></a><span class="kw">ncol</span>(df)</span>
<span id="cb72-8"><a href="base.html#cb72-8"></a><span class="co">#&gt; [1] 3</span></span></code></pre></div>
<p>例如，获取和设定行、列名：</p>
<div class="sourceCode" id="cb73"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb73-1"><a href="base.html#cb73-1"></a><span class="kw">rownames</span>(df)</span>
<span id="cb73-2"><a href="base.html#cb73-2"></a><span class="co">#&gt; [1] &quot;1&quot; &quot;2&quot; &quot;3&quot; &quot;4&quot;</span></span>
<span id="cb73-3"><a href="base.html#cb73-3"></a><span class="kw">colnames</span>(df)</span>
<span id="cb73-4"><a href="base.html#cb73-4"></a><span class="co">#&gt; [1] &quot;sex&quot;     &quot;age&quot;     &quot;heights&quot;</span></span>
<span id="cb73-5"><a href="base.html#cb73-5"></a></span>
<span id="cb73-6"><a href="base.html#cb73-6"></a><span class="kw">rownames</span>(df) &lt;-<span class="st"> </span><span class="kw">paste0</span>(<span class="st">&quot;Stu&quot;</span>, <span class="dv">1</span><span class="op">:</span><span class="dv">4</span>)</span>
<span id="cb73-7"><a href="base.html#cb73-7"></a><span class="co"># 将列名大写</span></span>
<span id="cb73-8"><a href="base.html#cb73-8"></a><span class="kw">colnames</span>(df) &lt;-<span class="st"> </span><span class="kw">toupper</span>(<span class="kw">colnames</span>(df))</span>
<span id="cb73-9"><a href="base.html#cb73-9"></a>df</span>
<span id="cb73-10"><a href="base.html#cb73-10"></a><span class="co">#&gt;      SEX AGE HEIGHTS</span></span>
<span id="cb73-11"><a href="base.html#cb73-11"></a><span class="co">#&gt; Stu1   F  17    1.66</span></span>
<span id="cb73-12"><a href="base.html#cb73-12"></a><span class="co">#&gt; Stu2   M  29    1.84</span></span>
<span id="cb73-13"><a href="base.html#cb73-13"></a><span class="co">#&gt; Stu3   M  20    1.83</span></span>
<span id="cb73-14"><a href="base.html#cb73-14"></a><span class="co">#&gt; Stu4   F  33    1.56</span></span></code></pre></div>
<p>数据框支持多种取子集的操作，包括整数索引、逻辑索引、行列名。</p>
<p>先看整数索引：</p>
<div class="sourceCode" id="cb74"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb74-1"><a href="base.html#cb74-1"></a>df[<span class="dv">1</span><span class="op">:</span><span class="dv">2</span>, <span class="dv">1</span><span class="op">:</span><span class="dv">2</span>]</span>
<span id="cb74-2"><a href="base.html#cb74-2"></a><span class="co">#&gt;      SEX AGE</span></span>
<span id="cb74-3"><a href="base.html#cb74-3"></a><span class="co">#&gt; Stu1   F  17</span></span>
<span id="cb74-4"><a href="base.html#cb74-4"></a><span class="co">#&gt; Stu2   M  29</span></span></code></pre></div>
<p>再看逻辑索引：</p>
<div class="sourceCode" id="cb75"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb75-1"><a href="base.html#cb75-1"></a>df[<span class="kw">c</span>(<span class="ot">TRUE</span>, <span class="ot">TRUE</span>, <span class="ot">FALSE</span>, <span class="ot">FALSE</span>), <span class="kw">c</span>(<span class="ot">TRUE</span>, <span class="ot">TRUE</span>, <span class="ot">FALSE</span>)]</span>
<span id="cb75-2"><a href="base.html#cb75-2"></a><span class="co">#&gt;      SEX AGE</span></span>
<span id="cb75-3"><a href="base.html#cb75-3"></a><span class="co">#&gt; Stu1   F  17</span></span>
<span id="cb75-4"><a href="base.html#cb75-4"></a><span class="co">#&gt; Stu2   M  29</span></span>
<span id="cb75-5"><a href="base.html#cb75-5"></a><span class="co"># 等价于</span></span>
<span id="cb75-6"><a href="base.html#cb75-6"></a>df[<span class="kw">rownames</span>(df) <span class="op">%in%</span><span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;Stu1&quot;</span>, <span class="st">&quot;Stu2&quot;</span>), <span class="kw">colnames</span>(df) <span class="op">%in%</span><span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;SEX&quot;</span>, <span class="st">&quot;AGE&quot;</span>)]</span>
<span id="cb75-7"><a href="base.html#cb75-7"></a><span class="co">#&gt;      SEX AGE</span></span>
<span id="cb75-8"><a href="base.html#cb75-8"></a><span class="co">#&gt; Stu1   F  17</span></span>
<span id="cb75-9"><a href="base.html#cb75-9"></a><span class="co">#&gt; Stu2   M  29</span></span></code></pre></div>
<p>这里 <code>%in%</code> 运算符是成员判断操作，如 <code>'a' %in% c('a', 'b')</code> 是判断 <code>'a'</code> 是否在字符串向量 <code>c('a', 'b')</code> 中。第二种写法看起来比较繁琐，但实际工作中比较常用。</p>
<p>我们还可以直接使用名字：</p>
<div class="sourceCode" id="cb76"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb76-1"><a href="base.html#cb76-1"></a>df[<span class="kw">c</span>(<span class="st">&quot;Stu1&quot;</span>, <span class="st">&quot;Stu2&quot;</span>), <span class="kw">c</span>(<span class="st">&quot;SEX&quot;</span>, <span class="st">&quot;AGE&quot;</span>)]</span>
<span id="cb76-2"><a href="base.html#cb76-2"></a><span class="co">#&gt;      SEX AGE</span></span>
<span id="cb76-3"><a href="base.html#cb76-3"></a><span class="co">#&gt; Stu1   F  17</span></span>
<span id="cb76-4"><a href="base.html#cb76-4"></a><span class="co">#&gt; Stu2   M  29</span></span></code></pre></div>
<p>单独提取某一列生成一个向量是一个常用操作，读者可以使用两种操作符，包括 <code>[[]]</code> 和 <code>$</code>。</p>
<p>例如提取 <code>SEX</code> 列：</p>
<div class="sourceCode" id="cb77"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb77-1"><a href="base.html#cb77-1"></a>df[[<span class="dv">1</span>]]</span>
<span id="cb77-2"><a href="base.html#cb77-2"></a><span class="co">#&gt; [1] &quot;F&quot; &quot;M&quot; &quot;M&quot; &quot;F&quot;</span></span>
<span id="cb77-3"><a href="base.html#cb77-3"></a>df[[<span class="st">&quot;SEX&quot;</span>]]</span>
<span id="cb77-4"><a href="base.html#cb77-4"></a><span class="co">#&gt; [1] &quot;F&quot; &quot;M&quot; &quot;M&quot; &quot;F&quot;</span></span>
<span id="cb77-5"><a href="base.html#cb77-5"></a>df<span class="op">$</span>SEX</span>
<span id="cb77-6"><a href="base.html#cb77-6"></a><span class="co">#&gt; [1] &quot;F&quot; &quot;M&quot; &quot;M&quot; &quot;F&quot;</span></span></code></pre></div>
<p>需要注意 <code>[[]]</code> 与 <code>[]</code> 的区别，后者依旧返回一个数据框：</p>
<div class="sourceCode" id="cb78"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb78-1"><a href="base.html#cb78-1"></a>df[<span class="st">&#39;SEX&#39;</span>]</span>
<span id="cb78-2"><a href="base.html#cb78-2"></a><span class="co">#&gt;      SEX</span></span>
<span id="cb78-3"><a href="base.html#cb78-3"></a><span class="co">#&gt; Stu1   F</span></span>
<span id="cb78-4"><a href="base.html#cb78-4"></a><span class="co">#&gt; Stu2   M</span></span>
<span id="cb78-5"><a href="base.html#cb78-5"></a><span class="co">#&gt; Stu3   M</span></span>
<span id="cb78-6"><a href="base.html#cb78-6"></a><span class="co">#&gt; Stu4   F</span></span></code></pre></div>
<p>另外，取子集操作可以使用 R 提供的 <code>subset()</code> 函数：</p>
<div class="sourceCode" id="cb79"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb79-1"><a href="base.html#cb79-1"></a><span class="co"># 取行</span></span>
<span id="cb79-2"><a href="base.html#cb79-2"></a><span class="kw">subset</span>(df, <span class="dt">subset =</span> <span class="kw">rownames</span>(df) <span class="op">%in%</span><span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;Stu1&quot;</span>, <span class="st">&quot;Stu2&quot;</span>))</span>
<span id="cb79-3"><a href="base.html#cb79-3"></a><span class="co">#&gt;      SEX AGE HEIGHTS</span></span>
<span id="cb79-4"><a href="base.html#cb79-4"></a><span class="co">#&gt; Stu1   F  17    1.66</span></span>
<span id="cb79-5"><a href="base.html#cb79-5"></a><span class="co">#&gt; Stu2   M  29    1.84</span></span>
<span id="cb79-6"><a href="base.html#cb79-6"></a><span class="co"># 取列</span></span>
<span id="cb79-7"><a href="base.html#cb79-7"></a><span class="kw">subset</span>(df, <span class="dt">select =</span> <span class="kw">colnames</span>(df) <span class="op">==</span><span class="st"> &quot;SEX&quot;</span>)</span>
<span id="cb79-8"><a href="base.html#cb79-8"></a><span class="co">#&gt;      SEX</span></span>
<span id="cb79-9"><a href="base.html#cb79-9"></a><span class="co">#&gt; Stu1   F</span></span>
<span id="cb79-10"><a href="base.html#cb79-10"></a><span class="co">#&gt; Stu2   M</span></span>
<span id="cb79-11"><a href="base.html#cb79-11"></a><span class="co">#&gt; Stu3   M</span></span>
<span id="cb79-12"><a href="base.html#cb79-12"></a><span class="co">#&gt; Stu4   F</span></span>
<span id="cb79-13"><a href="base.html#cb79-13"></a><span class="co"># 同时筛选行和列</span></span>
<span id="cb79-14"><a href="base.html#cb79-14"></a><span class="kw">subset</span>(df, <span class="dt">subset =</span> <span class="kw">rownames</span>(df) <span class="op">%in%</span><span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;Stu1&quot;</span>, <span class="st">&quot;Stu2&quot;</span>),</span>
<span id="cb79-15"><a href="base.html#cb79-15"></a>           <span class="dt">select =</span> <span class="kw">colnames</span>(df) <span class="op">==</span><span class="st"> &quot;SEX&quot;</span>)</span>
<span id="cb79-16"><a href="base.html#cb79-16"></a><span class="co">#&gt;      SEX</span></span>
<span id="cb79-17"><a href="base.html#cb79-17"></a><span class="co">#&gt; Stu1   F</span></span>
<span id="cb79-18"><a href="base.html#cb79-18"></a><span class="co">#&gt; Stu2   M</span></span></code></pre></div>
<p>数据框如果想要修改或更新某列，像向量一样重新赋值即可：</p>
<div class="sourceCode" id="cb80"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb80-1"><a href="base.html#cb80-1"></a>df<span class="op">$</span>AGE &lt;-<span class="st"> </span><span class="dv">20</span></span>
<span id="cb80-2"><a href="base.html#cb80-2"></a>df</span>
<span id="cb80-3"><a href="base.html#cb80-3"></a><span class="co">#&gt;      SEX AGE HEIGHTS</span></span>
<span id="cb80-4"><a href="base.html#cb80-4"></a><span class="co">#&gt; Stu1   F  20    1.66</span></span>
<span id="cb80-5"><a href="base.html#cb80-5"></a><span class="co">#&gt; Stu2   M  20    1.84</span></span>
<span id="cb80-6"><a href="base.html#cb80-6"></a><span class="co">#&gt; Stu3   M  20    1.83</span></span>
<span id="cb80-7"><a href="base.html#cb80-7"></a><span class="co">#&gt; Stu4   F  20    1.56</span></span></code></pre></div>
</div>
<div id="列表" class="section level3">
<h3><span class="header-section-number">2.1.4</span> 列表</h3>
<p>列表可以表示<strong>非常非常非常复杂</strong>的数据结构。数据框可以看作列表所有列元素长度相同的特例。</p>
<p>创建一个列表如下：</p>
<div class="sourceCode" id="cb81"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb81-1"><a href="base.html#cb81-1"></a>l &lt;-<span class="st"> </span><span class="kw">list</span>(</span>
<span id="cb81-2"><a href="base.html#cb81-2"></a>  <span class="dt">sex =</span> <span class="kw">c</span>(<span class="st">&quot;F&quot;</span>, <span class="st">&quot;M&quot;</span>),</span>
<span id="cb81-3"><a href="base.html#cb81-3"></a>  <span class="dt">age =</span> <span class="kw">c</span>(<span class="dv">17</span>, <span class="dv">29</span>, <span class="dv">20</span>),</span>
<span id="cb81-4"><a href="base.html#cb81-4"></a>  <span class="dt">heights =</span> <span class="kw">c</span>(<span class="fl">1.66</span>, <span class="fl">1.84</span>, <span class="fl">1.83</span>, <span class="fl">1.56</span>)</span>
<span id="cb81-5"><a href="base.html#cb81-5"></a>)</span>
<span id="cb81-6"><a href="base.html#cb81-6"></a>l</span>
<span id="cb81-7"><a href="base.html#cb81-7"></a><span class="co">#&gt; $sex</span></span>
<span id="cb81-8"><a href="base.html#cb81-8"></a><span class="co">#&gt; [1] &quot;F&quot; &quot;M&quot;</span></span>
<span id="cb81-9"><a href="base.html#cb81-9"></a><span class="co">#&gt; </span></span>
<span id="cb81-10"><a href="base.html#cb81-10"></a><span class="co">#&gt; $age</span></span>
<span id="cb81-11"><a href="base.html#cb81-11"></a><span class="co">#&gt; [1] 17 29 20</span></span>
<span id="cb81-12"><a href="base.html#cb81-12"></a><span class="co">#&gt; </span></span>
<span id="cb81-13"><a href="base.html#cb81-13"></a><span class="co">#&gt; $heights</span></span>
<span id="cb81-14"><a href="base.html#cb81-14"></a><span class="co">#&gt; [1] 1.66 1.84 1.83 1.56</span></span></code></pre></div>
<p>从输出上我们就可以知道如何提取不同的信息：</p>
<div class="sourceCode" id="cb82"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb82-1"><a href="base.html#cb82-1"></a>l<span class="op">$</span>sex</span>
<span id="cb82-2"><a href="base.html#cb82-2"></a><span class="co">#&gt; [1] &quot;F&quot; &quot;M&quot;</span></span>
<span id="cb82-3"><a href="base.html#cb82-3"></a>l<span class="op">$</span>heights</span>
<span id="cb82-4"><a href="base.html#cb82-4"></a><span class="co">#&gt; [1] 1.66 1.84 1.83 1.56</span></span></code></pre></div>
<p>列表只有 <code>names</code> 属性，没有行列名属性：</p>
<div class="sourceCode" id="cb83"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb83-1"><a href="base.html#cb83-1"></a><span class="kw">names</span>(l)</span>
<span id="cb83-2"><a href="base.html#cb83-2"></a><span class="co">#&gt; [1] &quot;sex&quot;     &quot;age&quot;     &quot;heights&quot;</span></span></code></pre></div>
<p>类似于数据框，<code>[[]]</code> 取子集得到一个列表元素，而 <code>[]</code> 得到一个子列表。</p>
<div class="sourceCode" id="cb84"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb84-1"><a href="base.html#cb84-1"></a>l[<span class="st">&#39;sex&#39;</span>]</span>
<span id="cb84-2"><a href="base.html#cb84-2"></a><span class="co">#&gt; $sex</span></span>
<span id="cb84-3"><a href="base.html#cb84-3"></a><span class="co">#&gt; [1] &quot;F&quot; &quot;M&quot;</span></span>
<span id="cb84-4"><a href="base.html#cb84-4"></a>l[[<span class="st">&#39;sex&#39;</span>]]</span>
<span id="cb84-5"><a href="base.html#cb84-5"></a><span class="co">#&gt; [1] &quot;F&quot; &quot;M&quot;</span></span></code></pre></div>
<p>列表是支持嵌套的，下面我们将两个列表 <code>l</code> 放到一起：</p>
<div class="sourceCode" id="cb85"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb85-1"><a href="base.html#cb85-1"></a>l2 &lt;-<span class="st"> </span>l</span>
<span id="cb85-2"><a href="base.html#cb85-2"></a>l2<span class="op">$</span>l1 &lt;-<span class="st"> </span>l</span>
<span id="cb85-3"><a href="base.html#cb85-3"></a>l2</span>
<span id="cb85-4"><a href="base.html#cb85-4"></a><span class="co">#&gt; $sex</span></span>
<span id="cb85-5"><a href="base.html#cb85-5"></a><span class="co">#&gt; [1] &quot;F&quot; &quot;M&quot;</span></span>
<span id="cb85-6"><a href="base.html#cb85-6"></a><span class="co">#&gt; </span></span>
<span id="cb85-7"><a href="base.html#cb85-7"></a><span class="co">#&gt; $age</span></span>
<span id="cb85-8"><a href="base.html#cb85-8"></a><span class="co">#&gt; [1] 17 29 20</span></span>
<span id="cb85-9"><a href="base.html#cb85-9"></a><span class="co">#&gt; </span></span>
<span id="cb85-10"><a href="base.html#cb85-10"></a><span class="co">#&gt; $heights</span></span>
<span id="cb85-11"><a href="base.html#cb85-11"></a><span class="co">#&gt; [1] 1.66 1.84 1.83 1.56</span></span>
<span id="cb85-12"><a href="base.html#cb85-12"></a><span class="co">#&gt; </span></span>
<span id="cb85-13"><a href="base.html#cb85-13"></a><span class="co">#&gt; $l1</span></span>
<span id="cb85-14"><a href="base.html#cb85-14"></a><span class="co">#&gt; $l1$sex</span></span>
<span id="cb85-15"><a href="base.html#cb85-15"></a><span class="co">#&gt; [1] &quot;F&quot; &quot;M&quot;</span></span>
<span id="cb85-16"><a href="base.html#cb85-16"></a><span class="co">#&gt; </span></span>
<span id="cb85-17"><a href="base.html#cb85-17"></a><span class="co">#&gt; $l1$age</span></span>
<span id="cb85-18"><a href="base.html#cb85-18"></a><span class="co">#&gt; [1] 17 29 20</span></span>
<span id="cb85-19"><a href="base.html#cb85-19"></a><span class="co">#&gt; </span></span>
<span id="cb85-20"><a href="base.html#cb85-20"></a><span class="co">#&gt; $l1$heights</span></span>
<span id="cb85-21"><a href="base.html#cb85-21"></a><span class="co">#&gt; [1] 1.66 1.84 1.83 1.56</span></span></code></pre></div>
</div>
</div>
<div id="控制结构" class="section level2">
<h2><span class="header-section-number">2.2</span> 控制结构</h2>
<p>在处理数据分析任务时，我们很少能够简单依赖命令的顺序执行就完成任务。为了处理程序的复杂逻辑以及减少代码量，我们需要学习条件与循环控制的使用。</p>
<div id="条件控制" class="section level3">
<h3><span class="header-section-number">2.2.1</span> 条件控制</h3>
<div id="if-语句" class="section level4">
<h4><span class="header-section-number">2.2.1.1</span> if 语句</h4>
<p>if 语句是最常用的条件结构，它由 if 关键字、条件判断语句和代码块组成：</p>
<div class="sourceCode" id="cb86"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb86-1"><a href="base.html#cb86-1"></a>age &lt;-<span class="st">  </span><span class="dv">20</span></span>
<span id="cb86-2"><a href="base.html#cb86-2"></a><span class="cf">if</span> (age <span class="op">&gt;</span><span class="st"> </span><span class="dv">18</span>) {</span>
<span id="cb86-3"><a href="base.html#cb86-3"></a>  <span class="co"># 如果条件判断结果为 TRUE</span></span>
<span id="cb86-4"><a href="base.html#cb86-4"></a>  <span class="co"># 该代码块中的语句会执行</span></span>
<span id="cb86-5"><a href="base.html#cb86-5"></a>  <span class="kw">message</span>(<span class="st">&quot;你是个成年人啦！&quot;</span>)</span>
<span id="cb86-6"><a href="base.html#cb86-6"></a>}</span>
<span id="cb86-7"><a href="base.html#cb86-7"></a><span class="co">#&gt; 你是个成年人啦！</span></span></code></pre></div>
<p>条件判断语句结果必须返回一个逻辑值，即 <code>TRUE</code> 或 <code>FALSE</code>。如果返回为 <code>TRUE</code>，随后以 <code>{}</code> 包裹的代码块会被执行。如果我们要处理为 <code>FALSE</code> 的情况，增加一个可选的 else 语句块。</p>
<div class="sourceCode" id="cb87"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb87-1"><a href="base.html#cb87-1"></a>age &lt;-<span class="st"> </span><span class="dv">16</span></span>
<span id="cb87-2"><a href="base.html#cb87-2"></a><span class="cf">if</span> (age <span class="op">&gt;</span><span class="st"> </span><span class="dv">18</span>) {</span>
<span id="cb87-3"><a href="base.html#cb87-3"></a>  <span class="co"># 为 TRUE 时执行</span></span>
<span id="cb87-4"><a href="base.html#cb87-4"></a>  <span class="kw">message</span>(<span class="st">&quot;你是个成年人啦！&quot;</span>)</span>
<span id="cb87-5"><a href="base.html#cb87-5"></a>} <span class="cf">else</span> {</span>
<span id="cb87-6"><a href="base.html#cb87-6"></a>  <span class="co"># 为 FALSE 时执行</span></span>
<span id="cb87-7"><a href="base.html#cb87-7"></a>  <span class="kw">message</span>(<span class="st">&quot;你还是个小孩子哟！&quot;</span>)</span>
<span id="cb87-8"><a href="base.html#cb87-8"></a>}</span>
<span id="cb87-9"><a href="base.html#cb87-9"></a><span class="co">#&gt; 你还是个小孩子哟！</span></span></code></pre></div>
<p>代码块中可以包含任意代码，所以 if-else 语句是支持内部嵌套的，结构如下：</p>
<div class="sourceCode" id="cb88"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb88-1"><a href="base.html#cb88-1"></a><span class="cf">if</span> () {</span>
<span id="cb88-2"><a href="base.html#cb88-2"></a>  <span class="cf">if</span> () {</span>
<span id="cb88-3"><a href="base.html#cb88-3"></a>    </span>
<span id="cb88-4"><a href="base.html#cb88-4"></a>  } <span class="cf">else</span> {</span>
<span id="cb88-5"><a href="base.html#cb88-5"></a>    </span>
<span id="cb88-6"><a href="base.html#cb88-6"></a>  }</span>
<span id="cb88-7"><a href="base.html#cb88-7"></a>} <span class="cf">else</span> {</span>
<span id="cb88-8"><a href="base.html#cb88-8"></a>  </span>
<span id="cb88-9"><a href="base.html#cb88-9"></a>}</span></code></pre></div>
<p>如果需要处理的情况是多种，if-else 语句可以连用。例如：</p>
<div class="sourceCode" id="cb89"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb89-1"><a href="base.html#cb89-1"></a>age &lt;-<span class="st"> </span><span class="dv">17</span></span>
<span id="cb89-2"><a href="base.html#cb89-2"></a><span class="cf">if</span> (age <span class="op">&gt;</span><span class="st"> </span><span class="dv">18</span>) {</span>
<span id="cb89-3"><a href="base.html#cb89-3"></a>  <span class="kw">message</span>(<span class="st">&quot;你是个成年人啦！&quot;</span>)</span>
<span id="cb89-4"><a href="base.html#cb89-4"></a>} <span class="cf">else</span> <span class="cf">if</span> (age <span class="op">&lt;</span><span class="st"> </span><span class="dv">17</span>) {</span>
<span id="cb89-5"><a href="base.html#cb89-5"></a>  <span class="kw">message</span>(<span class="st">&quot;你还是个小孩子哟！&quot;</span>)</span>
<span id="cb89-6"><a href="base.html#cb89-6"></a>} <span class="cf">else</span> {</span>
<span id="cb89-7"><a href="base.html#cb89-7"></a>  <span class="kw">message</span>(<span class="st">&quot;恭喜你，快要成年了！&quot;</span>)</span>
<span id="cb89-8"><a href="base.html#cb89-8"></a>}</span>
<span id="cb89-9"><a href="base.html#cb89-9"></a><span class="co">#&gt; 恭喜你，快要成年了！</span></span></code></pre></div>
</div>
<div id="switch-语句" class="section level4">
<h4><span class="header-section-number">2.2.1.2</span> switch 语句</h4>
<p>swtich 语句在 R 中存在，但读者会极少见到和使用它。结构如下：</p>
<div class="sourceCode" id="cb90"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb90-1"><a href="base.html#cb90-1"></a><span class="cf">switch</span>(EXPR, ...)</span></code></pre></div>
<p>这里 <code>EXPR</code> 指代表达式，而 <code>...</code> 说明可以输入命名参数。</p>
<p>这里只举一个简单的例子：</p>
<div class="sourceCode" id="cb91"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb91-1"><a href="base.html#cb91-1"></a>ch &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;b&quot;</span>)</span>
<span id="cb91-2"><a href="base.html#cb91-2"></a><span class="kw">cat</span>(ch,<span class="st">&quot;:&quot;</span>, <span class="cf">switch</span>(<span class="dt">EXPR =</span> ch, <span class="dt">a =</span> <span class="dv">1</span>, <span class="dt">b =</span> <span class="dv">2</span><span class="op">:</span><span class="dv">3</span>), <span class="st">&quot;</span><span class="ch">\n</span><span class="st">&quot;</span>)</span>
<span id="cb91-3"><a href="base.html#cb91-3"></a><span class="co">#&gt; b : 2 3</span></span></code></pre></div>
<p>switch 与函数式编程结合更具不凡的威力，其他场景下我极少见到该语句被使用。因此，我建议初学者了解即可，不必掌握。当然，读者如果遇到非常适合的场景也不妨试一试它，应该是可以让代码更为精炼有效的。</p>
</div>
<div id="提示信息" class="section level4">
<h4><span class="header-section-number">2.2.1.3</span> 提示信息</h4>
<p>编写程序时，通过输出一些提示信息可以更好地显示程序的运行状态是否如我们所预期，这是一个初学者需要掌握的一个技巧，能有效避免错误和帮助调试错误。</p>
<p>R 可以通过 <code>print()</code>、<code>message()</code>、<code>cat()</code>、<code>warning()</code> 和 <code>stop()</code> 输出提示信息，只有 <code>stop()</code> 会让程序终止。</p>
<p>读者通过下面的输出比较前几者的差别：</p>
<div class="sourceCode" id="cb92"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb92-1"><a href="base.html#cb92-1"></a><span class="kw">print</span>(<span class="st">&quot;Running...&quot;</span>)</span>
<span id="cb92-2"><a href="base.html#cb92-2"></a><span class="co">#&gt; [1] &quot;Running...&quot;</span></span>
<span id="cb92-3"><a href="base.html#cb92-3"></a><span class="kw">message</span>(<span class="st">&quot;Running...&quot;</span>)</span>
<span id="cb92-4"><a href="base.html#cb92-4"></a><span class="co">#&gt; Running...</span></span>
<span id="cb92-5"><a href="base.html#cb92-5"></a><span class="kw">cat</span>(<span class="st">&quot;Running...</span><span class="ch">\n</span><span class="st">&quot;</span>)</span>
<span id="cb92-6"><a href="base.html#cb92-6"></a><span class="co">#&gt; Running...</span></span>
<span id="cb92-7"><a href="base.html#cb92-7"></a><span class="kw">warning</span>(<span class="st">&quot;Running...&quot;</span>)</span>
<span id="cb92-8"><a href="base.html#cb92-8"></a><span class="co">#&gt; Warning: Running...</span></span></code></pre></div>
<p><code>cat()</code> 与 <code>message()</code> 看起来差别不大，但 <code>cat()</code> 无法被禁止输出，默认没有换行。另外 <code>message()</code> 和 <code>warning()</code> 的信息是可以被抑制掉的，如下：</p>
<div class="sourceCode" id="cb93"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb93-1"><a href="base.html#cb93-1"></a><span class="kw">message</span>(<span class="st">&quot;Running...&quot;</span>)</span>
<span id="cb93-2"><a href="base.html#cb93-2"></a><span class="co">#&gt; Running...</span></span>
<span id="cb93-3"><a href="base.html#cb93-3"></a><span class="kw">suppressMessages</span>(<span class="kw">message</span>(<span class="st">&quot;Running...&quot;</span>))</span>
<span id="cb93-4"><a href="base.html#cb93-4"></a></span>
<span id="cb93-5"><a href="base.html#cb93-5"></a><span class="kw">warning</span>(<span class="st">&quot;Running...&quot;</span>)</span>
<span id="cb93-6"><a href="base.html#cb93-6"></a><span class="co">#&gt; Warning: Running...</span></span>
<span id="cb93-7"><a href="base.html#cb93-7"></a><span class="kw">suppressWarnings</span>(<span class="kw">warning</span>(<span class="st">&quot;Running...&quot;</span>))</span></code></pre></div>
<p>我们再来了解下 <code>stop()</code>，它会直接让程序终止掉，这可以有效避免正确的代码跑错误的数据。</p>
<p>例如，计算均值需要一个数值型数据，但我们却传递了一个字符串：</p>
<div class="sourceCode" id="cb94"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb94-1"><a href="base.html#cb94-1"></a>heights_str &lt;-<span class="st"> </span><span class="kw">as.character</span>(heights)</span>
<span id="cb94-2"><a href="base.html#cb94-2"></a></span>
<span id="cb94-3"><a href="base.html#cb94-3"></a><span class="cf">if</span> (<span class="op">!</span><span class="kw">is.numeric</span>(heights_str)) {</span>
<span id="cb94-4"><a href="base.html#cb94-4"></a>  <span class="kw">stop</span>(<span class="st">&quot;无法对字符串计算！&quot;</span>)</span>
<span id="cb94-5"><a href="base.html#cb94-5"></a>} <span class="cf">else</span> {</span>
<span id="cb94-6"><a href="base.html#cb94-6"></a>  <span class="co"># 下面的代码不会被运行</span></span>
<span id="cb94-7"><a href="base.html#cb94-7"></a>  mu &lt;-<span class="st"> </span><span class="kw">mean</span>(heights_str)</span>
<span id="cb94-8"><a href="base.html#cb94-8"></a>}</span>
<span id="cb94-9"><a href="base.html#cb94-9"></a><span class="co">#&gt; Error in eval(expr, envir, enclos): 无法对字符串计算！</span></span></code></pre></div>
<p>一般情况下，我推荐读者按需使用 <code>message()</code>/<code>print()</code>、<code>warning()</code> 和 <code>stop()</code> 这几个函数，它们体现信息的 3 个不同级别：</p>
<ul>
<li><code>message()</code>/<code>print()</code> 提供普通的输出信息。</li>
<li><code>warning()</code> 提供需要注意的警告信息。</li>
<li><code>stop()</code> 提供令程序停止运行的信息。</li>
</ul>
</div>
</div>
<div id="循环控制" class="section level3">
<h3><span class="header-section-number">2.2.2</span> 循环控制</h3>
<p>当我们需要重复某一个（堆）操作时，就需要用到循环的力量了。R 中的循环语句效率历来被人诟病，但实际上已经大有改进。循环语句相比后面提到的 <code>apply</code> 家族函数具有更高的可读性，且容易理解和调试，因此我个人推荐初学者使用。如果本小节提到的几个循环控制语句确实影响到读者程序的效率，再找其他办法也不迟。</p>
<blockquote>
<p>在此强调一下，无论是程序的编写还是科研分析工作，<strong>完成</strong>永远比<strong>高效</strong>重要。</p>
</blockquote>
<div id="for-语句" class="section level4">
<h4><span class="header-section-number">2.2.2.1</span> for 语句</h4>
<p>for 语句需要配合迭代变量、in 关键字一起使用，结构如下：</p>
<div class="sourceCode" id="cb95"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb95-1"><a href="base.html#cb95-1"></a><span class="cf">for</span> (i <span class="cf">in</span> obj) {</span>
<span id="cb95-2"><a href="base.html#cb95-2"></a>  <span class="co"># 这里输入任意条语句</span></span>
<span id="cb95-3"><a href="base.html#cb95-3"></a>}</span></code></pre></div>
<p>这里 <code>i</code> 指代迭代变量，它可以是索引，也可以是子数据集。<code>obj</code> 指代一个可迭代对象。</p>
<p>针对循环打印变量 <code>heights</code> 的信息，可以有以下 2 种方式：</p>
<div class="sourceCode" id="cb96"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb96-1"><a href="base.html#cb96-1"></a><span class="co"># 第一种方式</span></span>
<span id="cb96-2"><a href="base.html#cb96-2"></a><span class="co"># 直接循环迭代对象本身</span></span>
<span id="cb96-3"><a href="base.html#cb96-3"></a><span class="cf">for</span> (i <span class="cf">in</span> heights) {</span>
<span id="cb96-4"><a href="base.html#cb96-4"></a>  <span class="kw">print</span>(i)</span>
<span id="cb96-5"><a href="base.html#cb96-5"></a>}</span>
<span id="cb96-6"><a href="base.html#cb96-6"></a><span class="co">#&gt; [1] 1.7</span></span>
<span id="cb96-7"><a href="base.html#cb96-7"></a><span class="co">#&gt; [1] 1.72</span></span>
<span id="cb96-8"><a href="base.html#cb96-8"></a><span class="co">#&gt; [1] 1.8</span></span>
<span id="cb96-9"><a href="base.html#cb96-9"></a><span class="co">#&gt; [1] 1.66</span></span>
<span id="cb96-10"><a href="base.html#cb96-10"></a><span class="co">#&gt; [1] 1.65</span></span>
<span id="cb96-11"><a href="base.html#cb96-11"></a><span class="co">#&gt; [1] 1.88</span></span>
<span id="cb96-12"><a href="base.html#cb96-12"></a></span>
<span id="cb96-13"><a href="base.html#cb96-13"></a><span class="co"># 第二种方式</span></span>
<span id="cb96-14"><a href="base.html#cb96-14"></a><span class="co"># 通过索引进行迭代</span></span>
<span id="cb96-15"><a href="base.html#cb96-15"></a><span class="cf">for</span> (i <span class="cf">in</span> <span class="dv">1</span><span class="op">:</span><span class="kw">length</span>(heights)) {</span>
<span id="cb96-16"><a href="base.html#cb96-16"></a>  <span class="kw">print</span>(heights[i])</span>
<span id="cb96-17"><a href="base.html#cb96-17"></a>}</span>
<span id="cb96-18"><a href="base.html#cb96-18"></a><span class="co">#&gt; Student: 1 </span></span>
<span id="cb96-19"><a href="base.html#cb96-19"></a><span class="co">#&gt;        1.7 </span></span>
<span id="cb96-20"><a href="base.html#cb96-20"></a><span class="co">#&gt; Student: 2 </span></span>
<span id="cb96-21"><a href="base.html#cb96-21"></a><span class="co">#&gt;       1.72 </span></span>
<span id="cb96-22"><a href="base.html#cb96-22"></a><span class="co">#&gt; Student: 3 </span></span>
<span id="cb96-23"><a href="base.html#cb96-23"></a><span class="co">#&gt;        1.8 </span></span>
<span id="cb96-24"><a href="base.html#cb96-24"></a><span class="co">#&gt; Student: 4 </span></span>
<span id="cb96-25"><a href="base.html#cb96-25"></a><span class="co">#&gt;       1.66 </span></span>
<span id="cb96-26"><a href="base.html#cb96-26"></a><span class="co">#&gt; Student: 5 </span></span>
<span id="cb96-27"><a href="base.html#cb96-27"></a><span class="co">#&gt;       1.65 </span></span>
<span id="cb96-28"><a href="base.html#cb96-28"></a><span class="co">#&gt; Student: 6 </span></span>
<span id="cb96-29"><a href="base.html#cb96-29"></a><span class="co">#&gt;       1.88</span></span></code></pre></div>
<p>第二种方式写法看起来更为复杂，但如果针对一些复杂的程序，它则显得更加逻辑分明。</p>
<p>初学者容易犯的一个错误是将 in 后面的可迭代对象写成一个标量，如下：</p>
<div class="sourceCode" id="cb97"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb97-1"><a href="base.html#cb97-1"></a><span class="cf">for</span> (i <span class="cf">in</span> <span class="kw">length</span>(heights)) {</span>
<span id="cb97-2"><a href="base.html#cb97-2"></a>  <span class="kw">print</span>(heights[i])</span>
<span id="cb97-3"><a href="base.html#cb97-3"></a>}</span>
<span id="cb97-4"><a href="base.html#cb97-4"></a><span class="co">#&gt; Student: 6 </span></span>
<span id="cb97-5"><a href="base.html#cb97-5"></a><span class="co">#&gt;       1.88</span></span></code></pre></div>
<p>需要注意下面两者的区别：</p>
<div class="sourceCode" id="cb98"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb98-1"><a href="base.html#cb98-1"></a><span class="kw">length</span>(heights)</span>
<span id="cb98-2"><a href="base.html#cb98-2"></a><span class="co">#&gt; [1] 6</span></span>
<span id="cb98-3"><a href="base.html#cb98-3"></a></span>
<span id="cb98-4"><a href="base.html#cb98-4"></a><span class="dv">1</span><span class="op">:</span><span class="kw">length</span>(heights)</span>
<span id="cb98-5"><a href="base.html#cb98-5"></a><span class="co">#&gt; [1] 1 2 3 4 5 6</span></span></code></pre></div>
<p>一种更好的写法是使用 <code>seq_along(heights)</code> 替代 <code>1:length(heights)</code>：</p>
<div class="sourceCode" id="cb99"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb99-1"><a href="base.html#cb99-1"></a><span class="cf">for</span> (i <span class="cf">in</span> <span class="kw">seq_along</span>(heights)) {</span>
<span id="cb99-2"><a href="base.html#cb99-2"></a>  <span class="kw">print</span>(heights[i])</span>
<span id="cb99-3"><a href="base.html#cb99-3"></a>}</span>
<span id="cb99-4"><a href="base.html#cb99-4"></a><span class="co">#&gt; Student: 1 </span></span>
<span id="cb99-5"><a href="base.html#cb99-5"></a><span class="co">#&gt;        1.7 </span></span>
<span id="cb99-6"><a href="base.html#cb99-6"></a><span class="co">#&gt; Student: 2 </span></span>
<span id="cb99-7"><a href="base.html#cb99-7"></a><span class="co">#&gt;       1.72 </span></span>
<span id="cb99-8"><a href="base.html#cb99-8"></a><span class="co">#&gt; Student: 3 </span></span>
<span id="cb99-9"><a href="base.html#cb99-9"></a><span class="co">#&gt;        1.8 </span></span>
<span id="cb99-10"><a href="base.html#cb99-10"></a><span class="co">#&gt; Student: 4 </span></span>
<span id="cb99-11"><a href="base.html#cb99-11"></a><span class="co">#&gt;       1.66 </span></span>
<span id="cb99-12"><a href="base.html#cb99-12"></a><span class="co">#&gt; Student: 5 </span></span>
<span id="cb99-13"><a href="base.html#cb99-13"></a><span class="co">#&gt;       1.65 </span></span>
<span id="cb99-14"><a href="base.html#cb99-14"></a><span class="co">#&gt; Student: 6 </span></span>
<span id="cb99-15"><a href="base.html#cb99-15"></a><span class="co">#&gt;       1.88</span></span></code></pre></div>
<p><code>seq_along()</code> 会自动返回可迭代对象的索引序列：</p>
<div class="sourceCode" id="cb100"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb100-1"><a href="base.html#cb100-1"></a><span class="kw">seq_along</span>(heights)</span>
<span id="cb100-2"><a href="base.html#cb100-2"></a><span class="co">#&gt; [1] 1 2 3 4 5 6</span></span></code></pre></div>
</div>
<div id="while-语句" class="section level4">
<h4><span class="header-section-number">2.2.2.2</span> while 语句</h4>
<p>for 语句已经能满足一般场景的使用，while 语句则特别适合于算法的设计中：</p>
<ul>
<li>不知道要运行多少次循环。</li>
<li>知道要退出循环的条件。</li>
</ul>
<p>下面举一个简单的例子：</p>
<div class="sourceCode" id="cb101"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb101-1"><a href="base.html#cb101-1"></a>v &lt;-<span class="st"> </span><span class="dv">10</span></span>
<span id="cb101-2"><a href="base.html#cb101-2"></a><span class="cf">while</span>(v <span class="op">&gt;</span><span class="st"> </span><span class="dv">2</span>) {</span>
<span id="cb101-3"><a href="base.html#cb101-3"></a>  <span class="kw">print</span>(v)</span>
<span id="cb101-4"><a href="base.html#cb101-4"></a>  v &lt;-<span class="st"> </span>v <span class="op">-</span><span class="st"> </span><span class="fl">1.1</span></span>
<span id="cb101-5"><a href="base.html#cb101-5"></a>}</span>
<span id="cb101-6"><a href="base.html#cb101-6"></a><span class="co">#&gt; [1] 10</span></span>
<span id="cb101-7"><a href="base.html#cb101-7"></a><span class="co">#&gt; [1] 8.9</span></span>
<span id="cb101-8"><a href="base.html#cb101-8"></a><span class="co">#&gt; [1] 7.8</span></span>
<span id="cb101-9"><a href="base.html#cb101-9"></a><span class="co">#&gt; [1] 6.7</span></span>
<span id="cb101-10"><a href="base.html#cb101-10"></a><span class="co">#&gt; [1] 5.6</span></span>
<span id="cb101-11"><a href="base.html#cb101-11"></a><span class="co">#&gt; [1] 4.5</span></span>
<span id="cb101-12"><a href="base.html#cb101-12"></a><span class="co">#&gt; [1] 3.4</span></span>
<span id="cb101-13"><a href="base.html#cb101-13"></a><span class="co">#&gt; [1] 2.3</span></span></code></pre></div>
</div>
<div id="repeat-语句与循环退出" class="section level4">
<h4><span class="header-section-number">2.2.2.3</span> repeat 语句与循环退出</h4>
<p>repeat 语句我从来没有使用过，它类似与 C 语言中的 do-while 语句，即先运行一段程序，然后看一看是否需要退出去。</p>
<p>它的结构如下：</p>
<div class="sourceCode" id="cb102"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb102-1"><a href="base.html#cb102-1"></a><span class="cf">repeat</span> EXPR</span></code></pre></div>
<p>EXPR 指代一个语句块。为了退出 repeat 循环，我们需要借助 break 语句的力量。</p>
<p>下面是一个简单例子：</p>
<div class="sourceCode" id="cb103"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb103-1"><a href="base.html#cb103-1"></a>i &lt;-<span class="st"> </span><span class="dv">1</span></span>
<span id="cb103-2"><a href="base.html#cb103-2"></a></span>
<span id="cb103-3"><a href="base.html#cb103-3"></a><span class="cf">repeat</span>{</span>
<span id="cb103-4"><a href="base.html#cb103-4"></a>  <span class="kw">print</span>(i)</span>
<span id="cb103-5"><a href="base.html#cb103-5"></a>  i &lt;-<span class="st"> </span>i<span class="op">*</span><span class="dv">2</span></span>
<span id="cb103-6"><a href="base.html#cb103-6"></a>  <span class="cf">if</span> (i <span class="op">&gt;</span><span class="st"> </span><span class="dv">100</span>) <span class="cf">break</span></span>
<span id="cb103-7"><a href="base.html#cb103-7"></a>}</span>
<span id="cb103-8"><a href="base.html#cb103-8"></a><span class="co">#&gt; [1] 1</span></span>
<span id="cb103-9"><a href="base.html#cb103-9"></a><span class="co">#&gt; [1] 2</span></span>
<span id="cb103-10"><a href="base.html#cb103-10"></a><span class="co">#&gt; [1] 4</span></span>
<span id="cb103-11"><a href="base.html#cb103-11"></a><span class="co">#&gt; [1] 8</span></span>
<span id="cb103-12"><a href="base.html#cb103-12"></a><span class="co">#&gt; [1] 16</span></span>
<span id="cb103-13"><a href="base.html#cb103-13"></a><span class="co">#&gt; [1] 32</span></span>
<span id="cb103-14"><a href="base.html#cb103-14"></a><span class="co">#&gt; [1] 64</span></span></code></pre></div>
<p>break 语句执行后将跳出当前的循环，另有 next 语句，它可以跳过后续代码的运行进入下一次循环。</p>
<p>基于上面的例子我们再构造一个示例：</p>
<div class="sourceCode" id="cb104"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb104-1"><a href="base.html#cb104-1"></a>i &lt;-<span class="st"> </span><span class="dv">1</span></span>
<span id="cb104-2"><a href="base.html#cb104-2"></a></span>
<span id="cb104-3"><a href="base.html#cb104-3"></a><span class="cf">repeat</span>{</span>
<span id="cb104-4"><a href="base.html#cb104-4"></a>  <span class="kw">print</span>(i)</span>
<span id="cb104-5"><a href="base.html#cb104-5"></a>  i &lt;-<span class="st"> </span>i<span class="op">*</span><span class="dv">2</span></span>
<span id="cb104-6"><a href="base.html#cb104-6"></a>  <span class="cf">if</span> (i <span class="op">&gt;</span><span class="st"> </span><span class="dv">200</span>) <span class="cf">break</span>()</span>
<span id="cb104-7"><a href="base.html#cb104-7"></a>  <span class="cf">if</span> (i <span class="op">&gt;</span><span class="st"> </span><span class="dv">100</span>) <span class="cf">next</span>()</span>
<span id="cb104-8"><a href="base.html#cb104-8"></a>  <span class="kw">print</span>(<span class="st">&quot;Can you see me?&quot;</span>)</span>
<span id="cb104-9"><a href="base.html#cb104-9"></a>}</span>
<span id="cb104-10"><a href="base.html#cb104-10"></a><span class="co">#&gt; [1] 1</span></span>
<span id="cb104-11"><a href="base.html#cb104-11"></a><span class="co">#&gt; [1] &quot;Can you see me?&quot;</span></span>
<span id="cb104-12"><a href="base.html#cb104-12"></a><span class="co">#&gt; [1] 2</span></span>
<span id="cb104-13"><a href="base.html#cb104-13"></a><span class="co">#&gt; [1] &quot;Can you see me?&quot;</span></span>
<span id="cb104-14"><a href="base.html#cb104-14"></a><span class="co">#&gt; [1] 4</span></span>
<span id="cb104-15"><a href="base.html#cb104-15"></a><span class="co">#&gt; [1] &quot;Can you see me?&quot;</span></span>
<span id="cb104-16"><a href="base.html#cb104-16"></a><span class="co">#&gt; [1] 8</span></span>
<span id="cb104-17"><a href="base.html#cb104-17"></a><span class="co">#&gt; [1] &quot;Can you see me?&quot;</span></span>
<span id="cb104-18"><a href="base.html#cb104-18"></a><span class="co">#&gt; [1] 16</span></span>
<span id="cb104-19"><a href="base.html#cb104-19"></a><span class="co">#&gt; [1] &quot;Can you see me?&quot;</span></span>
<span id="cb104-20"><a href="base.html#cb104-20"></a><span class="co">#&gt; [1] 32</span></span>
<span id="cb104-21"><a href="base.html#cb104-21"></a><span class="co">#&gt; [1] &quot;Can you see me?&quot;</span></span>
<span id="cb104-22"><a href="base.html#cb104-22"></a><span class="co">#&gt; [1] 64</span></span>
<span id="cb104-23"><a href="base.html#cb104-23"></a><span class="co">#&gt; [1] 128</span></span></code></pre></div>
<p>当 <code>i &gt; 100</code> 后，最后一条输出语句就不再运行。</p>
</div>
</div>
</div>
<div id="函数与函数式编程" class="section level2">
<h2><span class="header-section-number">2.3</span> 函数与函数式编程</h2>
<p><strong>函数是代码模板</strong>。</p>
<p>前面我们使用符号（Symbol）来对数据抽象形成我们所谓的变量，变量名解释了所指向数据的内含但遮掩了底层的结构。类似地，我们也利用符号来对代码块所运行的操作集合进行抽象，并将其称为<strong>函数</strong>。</p>
<ul>
<li>变量 &lt;- 数据。</li>
<li>函数 &lt;- 操作。</li>
</ul>
<p>这样，函数就使得一组操作可以像使用变量那样重复使用了。</p>
<div id="创建和使用函数" class="section level3">
<h3><span class="header-section-number">2.3.1</span> 创建和使用函数</h3>
<p>我们通过自定义一个计算均值的函数来查看函数是如何创建的：</p>
<div class="sourceCode" id="cb105"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb105-1"><a href="base.html#cb105-1"></a>customMean &lt;-<span class="st"> </span><span class="cf">function</span>(x) {  <span class="co"># x 是输入参数</span></span>
<span id="cb105-2"><a href="base.html#cb105-2"></a>  </span>
<span id="cb105-3"><a href="base.html#cb105-3"></a>  <span class="co"># 以下是操作集合，即代码块</span></span>
<span id="cb105-4"><a href="base.html#cb105-4"></a>  s &lt;-<span class="st"> </span>i &lt;-<span class="st"> </span><span class="dv">0</span></span>
<span id="cb105-5"><a href="base.html#cb105-5"></a>  <span class="cf">for</span> (j <span class="cf">in</span> x) {</span>
<span id="cb105-6"><a href="base.html#cb105-6"></a>    s &lt;-<span class="st"> </span>s <span class="op">+</span><span class="st"> </span>j</span>
<span id="cb105-7"><a href="base.html#cb105-7"></a>    i &lt;-<span class="st"> </span>i <span class="op">+</span><span class="st"> </span><span class="dv">1</span></span>
<span id="cb105-8"><a href="base.html#cb105-8"></a>  }</span>
<span id="cb105-9"><a href="base.html#cb105-9"></a>  </span>
<span id="cb105-10"><a href="base.html#cb105-10"></a>  <span class="kw">return</span>(s <span class="op">/</span><span class="st"> </span>i)  <span class="co"># s / i 是返回值</span></span>
<span id="cb105-11"><a href="base.html#cb105-11"></a>}</span></code></pre></div>
<p>一个函数包含输入参数、代码块和返回值 3 部分。当函数中没有使用 <code>return()</code> 时，函数默认会返回最后一个表达式的结果，因此上述代码中将 <code>return(s / i)</code> 改为 <code>s / i</code> 是完全一样的，但后者代码逻辑没有前者清楚。</p>
<p>接下来我们看如何使用这个函数。在创建函数时其实我们已经默认假设输入的是一个数值向量，先试试看：</p>
<div class="sourceCode" id="cb106"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb106-1"><a href="base.html#cb106-1"></a><span class="kw">customMean</span>(<span class="dt">x =</span> <span class="kw">c</span>(<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>))</span>
<span id="cb106-2"><a href="base.html#cb106-2"></a><span class="co">#&gt; [1] 2</span></span></code></pre></div>
<p>结果是对的。</p>
<p>假设我们不仅仅想返回结果，还想要打印计算信息，实现如下新的函数版本：</p>
<div class="sourceCode" id="cb107"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb107-1"><a href="base.html#cb107-1"></a>customMean_v2 &lt;-<span class="st"> </span><span class="cf">function</span>(x) {  </span>
<span id="cb107-2"><a href="base.html#cb107-2"></a>  </span>
<span id="cb107-3"><a href="base.html#cb107-3"></a>  s &lt;-<span class="st"> </span>i &lt;-<span class="st"> </span><span class="dv">0</span></span>
<span id="cb107-4"><a href="base.html#cb107-4"></a>  <span class="cf">for</span> (j <span class="cf">in</span> x) {</span>
<span id="cb107-5"><a href="base.html#cb107-5"></a>    s &lt;-<span class="st"> </span>s <span class="op">+</span><span class="st"> </span>j</span>
<span id="cb107-6"><a href="base.html#cb107-6"></a>    i &lt;-<span class="st"> </span>i <span class="op">+</span><span class="st"> </span><span class="dv">1</span></span>
<span id="cb107-7"><a href="base.html#cb107-7"></a>  }</span>
<span id="cb107-8"><a href="base.html#cb107-8"></a>  </span>
<span id="cb107-9"><a href="base.html#cb107-9"></a>  mu &lt;-<span class="st"> </span>s <span class="op">/</span><span class="st"> </span>i</span>
<span id="cb107-10"><a href="base.html#cb107-10"></a>  </span>
<span id="cb107-11"><a href="base.html#cb107-11"></a>  <span class="kw">message</span>(</span>
<span id="cb107-12"><a href="base.html#cb107-12"></a>    <span class="st">&quot;Mean of sequence &quot;</span>,</span>
<span id="cb107-13"><a href="base.html#cb107-13"></a>    <span class="kw">paste</span>(x, <span class="dt">collapse =</span> <span class="st">&quot;,&quot;</span>),</span>
<span id="cb107-14"><a href="base.html#cb107-14"></a>    <span class="st">&quot; is &quot;</span>,</span>
<span id="cb107-15"><a href="base.html#cb107-15"></a>    mu</span>
<span id="cb107-16"><a href="base.html#cb107-16"></a>  )</span>
<span id="cb107-17"><a href="base.html#cb107-17"></a>  </span>
<span id="cb107-18"><a href="base.html#cb107-18"></a>  <span class="kw">return</span>(mu)  </span>
<span id="cb107-19"><a href="base.html#cb107-19"></a>}</span></code></pre></div>
<p>再来看下结果：</p>
<div class="sourceCode" id="cb108"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb108-1"><a href="base.html#cb108-1"></a><span class="kw">customMean_v2</span>(<span class="dt">x =</span> <span class="kw">c</span>(<span class="dv">1</span><span class="op">:</span><span class="dv">3</span>))</span>
<span id="cb108-2"><a href="base.html#cb108-2"></a><span class="co">#&gt; Mean of sequence 1,2,3 is 2</span></span>
<span id="cb108-3"><a href="base.html#cb108-3"></a><span class="co">#&gt; [1] 2</span></span></code></pre></div>
<p>这样结果看起来更加人性化了。但仔细思考一下，更新后的函数引入了新的问题：如果有 10000 个数字相加，这样打印信息还是一件好事吗？</p>
<p>我们不妨再引入一个新的函数版本，这个版本处理打印以及如何打印的问题：</p>
<div class="sourceCode" id="cb109"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb109-1"><a href="base.html#cb109-1"></a>customMean_v3 &lt;-<span class="st"> </span><span class="cf">function</span>(x, <span class="dt">verbose =</span> <span class="ot">TRUE</span>) {  </span>
<span id="cb109-2"><a href="base.html#cb109-2"></a>  </span>
<span id="cb109-3"><a href="base.html#cb109-3"></a>  s &lt;-<span class="st"> </span>i &lt;-<span class="st"> </span><span class="dv">0</span></span>
<span id="cb109-4"><a href="base.html#cb109-4"></a>  <span class="cf">for</span> (j <span class="cf">in</span> x) {</span>
<span id="cb109-5"><a href="base.html#cb109-5"></a>    s &lt;-<span class="st"> </span>s <span class="op">+</span><span class="st"> </span>j</span>
<span id="cb109-6"><a href="base.html#cb109-6"></a>    i &lt;-<span class="st"> </span>i <span class="op">+</span><span class="st"> </span><span class="dv">1</span></span>
<span id="cb109-7"><a href="base.html#cb109-7"></a>  }</span>
<span id="cb109-8"><a href="base.html#cb109-8"></a>  </span>
<span id="cb109-9"><a href="base.html#cb109-9"></a>  mu &lt;-<span class="st"> </span>s <span class="op">/</span><span class="st"> </span>i</span>
<span id="cb109-10"><a href="base.html#cb109-10"></a>  </span>
<span id="cb109-11"><a href="base.html#cb109-11"></a>  <span class="cf">if</span> (verbose) {</span>
<span id="cb109-12"><a href="base.html#cb109-12"></a>    l &lt;-<span class="st"> </span><span class="kw">length</span>(x)</span>
<span id="cb109-13"><a href="base.html#cb109-13"></a>    <span class="cf">if</span> (l <span class="op">&gt;</span><span class="st"> </span><span class="dv">10</span>) {</span>
<span id="cb109-14"><a href="base.html#cb109-14"></a>      <span class="kw">message</span>(</span>
<span id="cb109-15"><a href="base.html#cb109-15"></a>        <span class="st">&quot;Mean of sequence &quot;</span>,</span>
<span id="cb109-16"><a href="base.html#cb109-16"></a>        <span class="kw">paste</span>(<span class="kw">c</span>(x[<span class="dv">1</span><span class="op">:</span><span class="dv">5</span>], <span class="st">&quot;...&quot;</span>, x[(l<span class="dv">-4</span>)<span class="op">:</span>l]), <span class="dt">collapse =</span> <span class="st">&quot;,&quot;</span>),</span>
<span id="cb109-17"><a href="base.html#cb109-17"></a>        <span class="st">&quot; is &quot;</span>,</span>
<span id="cb109-18"><a href="base.html#cb109-18"></a>        mu</span>
<span id="cb109-19"><a href="base.html#cb109-19"></a>      )</span>
<span id="cb109-20"><a href="base.html#cb109-20"></a>    } <span class="cf">else</span> {</span>
<span id="cb109-21"><a href="base.html#cb109-21"></a>      <span class="kw">message</span>(</span>
<span id="cb109-22"><a href="base.html#cb109-22"></a>        <span class="st">&quot;Mean of sequence &quot;</span>,</span>
<span id="cb109-23"><a href="base.html#cb109-23"></a>        <span class="kw">paste</span>(x, <span class="dt">collapse =</span> <span class="st">&quot;,&quot;</span>),</span>
<span id="cb109-24"><a href="base.html#cb109-24"></a>        <span class="st">&quot; is &quot;</span>,</span>
<span id="cb109-25"><a href="base.html#cb109-25"></a>        mu</span>
<span id="cb109-26"><a href="base.html#cb109-26"></a>      )</span>
<span id="cb109-27"><a href="base.html#cb109-27"></a>    }</span>
<span id="cb109-28"><a href="base.html#cb109-28"></a>  }</span>
<span id="cb109-29"><a href="base.html#cb109-29"></a>  </span>
<span id="cb109-30"><a href="base.html#cb109-30"></a>  <span class="kw">return</span>(mu)  </span>
<span id="cb109-31"><a href="base.html#cb109-31"></a>}</span></code></pre></div>
<p>我们用这个函数试一下输入少或多的情况。</p>
<div class="sourceCode" id="cb110"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb110-1"><a href="base.html#cb110-1"></a><span class="kw">customMean_v3</span>(<span class="dt">x =</span> <span class="dv">1</span><span class="op">:</span><span class="dv">10</span>)</span>
<span id="cb110-2"><a href="base.html#cb110-2"></a><span class="co">#&gt; Mean of sequence 1,2,3,4,5,6,7,8,9,10 is 5.5</span></span>
<span id="cb110-3"><a href="base.html#cb110-3"></a><span class="co">#&gt; [1] 5.5</span></span>
<span id="cb110-4"><a href="base.html#cb110-4"></a><span class="kw">customMean_v3</span>(<span class="dt">x =</span> <span class="dv">1</span><span class="op">:</span><span class="dv">100</span>)</span>
<span id="cb110-5"><a href="base.html#cb110-5"></a><span class="co">#&gt; Mean of sequence 1,2,3,4,5,...,96,97,98,99,100 is 50.5</span></span>
<span id="cb110-6"><a href="base.html#cb110-6"></a><span class="co">#&gt; [1] 50.5</span></span></code></pre></div>
<p>除此之外，我们在新的版本中引入了一个默认参数 <code>verbose</code>，我们可以选择不打印信息：</p>
<div class="sourceCode" id="cb111"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb111-1"><a href="base.html#cb111-1"></a><span class="kw">customMean_v3</span>(<span class="dt">x =</span> <span class="dv">1</span><span class="op">:</span><span class="dv">100</span>, <span class="dt">verbose =</span> <span class="ot">FALSE</span>)</span>
<span id="cb111-2"><a href="base.html#cb111-2"></a><span class="co">#&gt; [1] 50.5</span></span></code></pre></div>
<p>当按顺序输入函数参数时，参数的名称是可以不输入的，下面的结果一致：</p>
<div class="sourceCode" id="cb112"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb112-1"><a href="base.html#cb112-1"></a><span class="kw">customMean_v3</span>(<span class="dv">1</span><span class="op">:</span><span class="dv">100</span>, <span class="ot">FALSE</span>)</span>
<span id="cb112-2"><a href="base.html#cb112-2"></a><span class="co">#&gt; [1] 50.5</span></span></code></pre></div>
<p>以上的输入都是基于函数使用者很清楚的知道输入是一个数值型向量，有时候这一点很难做到。例如，你将代码发送给一位不懂编程的人员使用。此时，添加参数检查和注释是有必要的，我们由此创建一个新的函数版本：</p>
<div class="sourceCode" id="cb113"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb113-1"><a href="base.html#cb113-1"></a><span class="co"># @title 计算均值</span></span>
<span id="cb113-2"><a href="base.html#cb113-2"></a><span class="co"># @param x 输入数据，一个数值向量</span></span>
<span id="cb113-3"><a href="base.html#cb113-3"></a><span class="co"># @param verbose 逻辑值，控制是否打印</span></span>
<span id="cb113-4"><a href="base.html#cb113-4"></a>customMean_v4 &lt;-<span class="st"> </span><span class="cf">function</span>(x, <span class="dt">verbose =</span> <span class="ot">TRUE</span>) {  </span>
<span id="cb113-5"><a href="base.html#cb113-5"></a>  </span>
<span id="cb113-6"><a href="base.html#cb113-6"></a>  <span class="cf">if</span> (<span class="op">!</span><span class="kw">is.numeric</span>(x)) {</span>
<span id="cb113-7"><a href="base.html#cb113-7"></a>    <span class="kw">stop</span>(<span class="st">&quot;输入数据必须是一个数值型向量！&quot;</span>)</span>
<span id="cb113-8"><a href="base.html#cb113-8"></a>  }</span>
<span id="cb113-9"><a href="base.html#cb113-9"></a>  </span>
<span id="cb113-10"><a href="base.html#cb113-10"></a>  s &lt;-<span class="st"> </span>i &lt;-<span class="st"> </span><span class="dv">0</span></span>
<span id="cb113-11"><a href="base.html#cb113-11"></a>  <span class="cf">for</span> (j <span class="cf">in</span> x) {</span>
<span id="cb113-12"><a href="base.html#cb113-12"></a>    s &lt;-<span class="st"> </span>s <span class="op">+</span><span class="st"> </span>j</span>
<span id="cb113-13"><a href="base.html#cb113-13"></a>    i &lt;-<span class="st"> </span>i <span class="op">+</span><span class="st"> </span><span class="dv">1</span></span>
<span id="cb113-14"><a href="base.html#cb113-14"></a>  }</span>
<span id="cb113-15"><a href="base.html#cb113-15"></a>  </span>
<span id="cb113-16"><a href="base.html#cb113-16"></a>  mu &lt;-<span class="st"> </span>s <span class="op">/</span><span class="st"> </span>i</span>
<span id="cb113-17"><a href="base.html#cb113-17"></a>  </span>
<span id="cb113-18"><a href="base.html#cb113-18"></a>  <span class="cf">if</span> (verbose) {</span>
<span id="cb113-19"><a href="base.html#cb113-19"></a>    l &lt;-<span class="st"> </span><span class="kw">length</span>(x)</span>
<span id="cb113-20"><a href="base.html#cb113-20"></a>    <span class="cf">if</span> (l <span class="op">&gt;</span><span class="st"> </span><span class="dv">10</span>) {</span>
<span id="cb113-21"><a href="base.html#cb113-21"></a>      <span class="kw">message</span>(</span>
<span id="cb113-22"><a href="base.html#cb113-22"></a>        <span class="st">&quot;Mean of sequence &quot;</span>,</span>
<span id="cb113-23"><a href="base.html#cb113-23"></a>        <span class="kw">paste</span>(<span class="kw">c</span>(x[<span class="dv">1</span><span class="op">:</span><span class="dv">5</span>], <span class="st">&quot;...&quot;</span>, x[(l<span class="dv">-4</span>)<span class="op">:</span>l]), <span class="dt">collapse =</span> <span class="st">&quot;,&quot;</span>),</span>
<span id="cb113-24"><a href="base.html#cb113-24"></a>        <span class="st">&quot; is &quot;</span>,</span>
<span id="cb113-25"><a href="base.html#cb113-25"></a>        mu</span>
<span id="cb113-26"><a href="base.html#cb113-26"></a>      )</span>
<span id="cb113-27"><a href="base.html#cb113-27"></a>    } <span class="cf">else</span> {</span>
<span id="cb113-28"><a href="base.html#cb113-28"></a>      <span class="kw">message</span>(</span>
<span id="cb113-29"><a href="base.html#cb113-29"></a>        <span class="st">&quot;Mean of sequence &quot;</span>,</span>
<span id="cb113-30"><a href="base.html#cb113-30"></a>        <span class="kw">paste</span>(x, <span class="dt">collapse =</span> <span class="st">&quot;,&quot;</span>),</span>
<span id="cb113-31"><a href="base.html#cb113-31"></a>        <span class="st">&quot; is &quot;</span>,</span>
<span id="cb113-32"><a href="base.html#cb113-32"></a>        mu</span>
<span id="cb113-33"><a href="base.html#cb113-33"></a>      )</span>
<span id="cb113-34"><a href="base.html#cb113-34"></a>    }</span>
<span id="cb113-35"><a href="base.html#cb113-35"></a>  }</span>
<span id="cb113-36"><a href="base.html#cb113-36"></a>  </span>
<span id="cb113-37"><a href="base.html#cb113-37"></a>  <span class="kw">return</span>(mu)  </span>
<span id="cb113-38"><a href="base.html#cb113-38"></a>}</span></code></pre></div>
<p>以<code>#</code> 开始的文本被 R 认为是一个代码注释，后续 <code>@title</code> 和 <code>@param</code> 是注释标签，这些是<strong>非必需</strong>的，它只是用来更好地描述注释的内容。</p>
<blockquote>
<p>代码标签符合 <strong>roxygen2</strong> 包的定义，有兴趣的读者可以看一看这个包文档。</p>
</blockquote>
<div class="sourceCode" id="cb114"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb114-1"><a href="base.html#cb114-1"></a><span class="kw">customMean_v4</span>(<span class="kw">c</span>(<span class="st">&quot;1&quot;</span>, <span class="st">&quot;2&quot;</span>, <span class="st">&quot;3&quot;</span>))</span>
<span id="cb114-2"><a href="base.html#cb114-2"></a><span class="co">#&gt; Error in customMean_v4(c(&quot;1&quot;, &quot;2&quot;, &quot;3&quot;)): 输入数据必须是一个数值型向量！</span></span></code></pre></div>
<p>最后，我们来了解一下函数的计算效率。这里我们将创建的 <code>customMean()</code> 函数与 R 内置的 <code>mean()</code> 函数进行对比。<code>system.time()</code> 函数用来判断函数执行消耗的时间。</p>
<div class="sourceCode" id="cb115"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb115-1"><a href="base.html#cb115-1"></a><span class="kw">system.time</span>(<span class="kw">customMean</span>(<span class="dv">1</span><span class="op">:</span><span class="fl">1e7</span>))</span>
<span id="cb115-2"><a href="base.html#cb115-2"></a><span class="co">#&gt;    user  system elapsed </span></span>
<span id="cb115-3"><a href="base.html#cb115-3"></a><span class="co">#&gt;    0.48    0.00    0.50</span></span>
<span id="cb115-4"><a href="base.html#cb115-4"></a><span class="kw">system.time</span>(<span class="kw">mean</span>(<span class="dv">1</span><span class="op">:</span><span class="fl">1e7</span>))</span>
<span id="cb115-5"><a href="base.html#cb115-5"></a><span class="co">#&gt;    user  system elapsed </span></span>
<span id="cb115-6"><a href="base.html#cb115-6"></a><span class="co">#&gt;    0.03    0.00    0.03</span></span></code></pre></div>
<p><code>elapsed</code> 项给出了计算机执行函数消耗的总时间（以秒为单位），可以看出，内置的函数还是要快很多的。当然，这并不是一个严格的性能测评，但它已经能清楚地表明两者的差距。</p>
</div>
<div id="函数式编程" class="section level3">
<h3><span class="header-section-number">2.3.2</span> 函数式编程</h3>
<p>函数不仅仅可以被调用，<strong>它还可以被当作函数的参数和返回值</strong>，这是函数式编程的特点。</p>
<div id="传入和返回函数" class="section level4">
<h4><span class="header-section-number">2.3.2.1</span> 传入和返回函数</h4>
<p>例如，我们创建一个略显奇怪的函数：</p>
<div class="sourceCode" id="cb116"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb116-1"><a href="base.html#cb116-1"></a>f &lt;-<span class="st"> </span><span class="cf">function</span>(x, fun) {</span>
<span id="cb116-2"><a href="base.html#cb116-2"></a>  <span class="kw">fun</span>(x)</span>
<span id="cb116-3"><a href="base.html#cb116-3"></a>}</span></code></pre></div>
<p>它可以将常见的数值计算函数作为参数计算相应的结果，在讲解之前我们先看看效果：</p>
<div class="sourceCode" id="cb117"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb117-1"><a href="base.html#cb117-1"></a><span class="kw">f</span>(<span class="dv">1</span><span class="op">:</span><span class="dv">10</span>, sum)</span>
<span id="cb117-2"><a href="base.html#cb117-2"></a><span class="co">#&gt; [1] 55</span></span>
<span id="cb117-3"><a href="base.html#cb117-3"></a><span class="kw">f</span>(<span class="dv">1</span><span class="op">:</span><span class="dv">10</span>, mean)</span>
<span id="cb117-4"><a href="base.html#cb117-4"></a><span class="co">#&gt; [1] 5.5</span></span>
<span id="cb117-5"><a href="base.html#cb117-5"></a><span class="kw">f</span>(<span class="dv">1</span><span class="op">:</span><span class="dv">10</span>, quantile)</span>
<span id="cb117-6"><a href="base.html#cb117-6"></a><span class="co">#&gt;    0%   25%   50%   75%  100% </span></span>
<span id="cb117-7"><a href="base.html#cb117-7"></a><span class="co">#&gt;  1.00  3.25  5.50  7.75 10.00</span></span></code></pre></div>
<p>不难理解，上述代码中发挥计算功效的是函数的第 2 个参数。在我们创建的函数 <code>f()</code> 中，我们可以理解为对传入函数的 <code>mean()</code>、<code>sum()</code> 等函数重命名成 <code>fun()</code> 并进行调用。</p>
<p>我们还可以构建一个函数作为返回值的例子：</p>
<div class="sourceCode" id="cb118"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb118-1"><a href="base.html#cb118-1"></a>f2 &lt;-<span class="st"> </span><span class="cf">function</span>(type) {</span>
<span id="cb118-2"><a href="base.html#cb118-2"></a>  <span class="cf">switch</span>(type,</span>
<span id="cb118-3"><a href="base.html#cb118-3"></a>         <span class="dt">mean =</span> mean,</span>
<span id="cb118-4"><a href="base.html#cb118-4"></a>         <span class="dt">sum =</span> sum,</span>
<span id="cb118-5"><a href="base.html#cb118-5"></a>         <span class="dt">quantile =</span> quantile)</span>
<span id="cb118-6"><a href="base.html#cb118-6"></a>}</span></code></pre></div>
<p><code>f()</code> 函数使用了 switch 语句，如果使用 if-else 语句实现该函数也是可以的（读者不妨一试），但此处 switch 让代码更加简明。</p>
<p>下面看看效果：</p>
<div class="sourceCode" id="cb119"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb119-1"><a href="base.html#cb119-1"></a><span class="kw">f2</span>(<span class="st">&quot;mean&quot;</span>)</span>
<span id="cb119-2"><a href="base.html#cb119-2"></a><span class="co">#&gt; function (x, ...) </span></span>
<span id="cb119-3"><a href="base.html#cb119-3"></a><span class="co">#&gt; UseMethod(&quot;mean&quot;)</span></span>
<span id="cb119-4"><a href="base.html#cb119-4"></a><span class="co">#&gt; &lt;bytecode: 0x0000024974d6b258&gt;</span></span>
<span id="cb119-5"><a href="base.html#cb119-5"></a><span class="co">#&gt; &lt;environment: namespace:base&gt;</span></span>
<span id="cb119-6"><a href="base.html#cb119-6"></a><span class="kw">f2</span>(<span class="st">&quot;sum&quot;</span>)</span>
<span id="cb119-7"><a href="base.html#cb119-7"></a><span class="co">#&gt; function (..., na.rm = FALSE)  .Primitive(&quot;sum&quot;)</span></span>
<span id="cb119-8"><a href="base.html#cb119-8"></a><span class="kw">f2</span>(<span class="st">&quot;quantile&quot;</span>)</span>
<span id="cb119-9"><a href="base.html#cb119-9"></a><span class="co">#&gt; function (x, ...) </span></span>
<span id="cb119-10"><a href="base.html#cb119-10"></a><span class="co">#&gt; UseMethod(&quot;quantile&quot;)</span></span>
<span id="cb119-11"><a href="base.html#cb119-11"></a><span class="co">#&gt; &lt;bytecode: 0x0000024973ca7b58&gt;</span></span>
<span id="cb119-12"><a href="base.html#cb119-12"></a><span class="co">#&gt; &lt;environment: namespace:stats&gt;</span></span></code></pre></div>
<p>返回的全部都是函数，那么我们是不是可以直接调用它呢？</p>
<div class="sourceCode" id="cb120"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb120-1"><a href="base.html#cb120-1"></a><span class="kw">f2</span>(<span class="st">&quot;mean&quot;</span>)(<span class="dv">1</span><span class="op">:</span><span class="dv">10</span>)</span>
<span id="cb120-2"><a href="base.html#cb120-2"></a><span class="co">#&gt; [1] 5.5</span></span></code></pre></div>
<p>事实证明是可以的。</p>
<p>虽然上面只是通过 2 段简单的代码展示函数式编程的特性，但不难想象到它给 R 语言编程赋予了更多的灵活性。</p>
</div>
<div id="apply-家族" class="section level4">
<h4><span class="header-section-number">2.3.2.2</span> apply 家族</h4>
</div>
</div>
</div>
<div id="三方包的安装与加载" class="section level2">
<h2><span class="header-section-number">2.4</span> 三方包的安装与加载</h2>
<p>R 内置了基础计算、统计分析和绘图包，但依旧无法满足众多 R 用户的个性化需求。目前有超过 10,000 个三方包分布在 CRAN、Bioconductor、GitHub 等平台上，它们得安装方式都不尽相同。</p>
<div id="cran" class="section level3">
<h3><span class="header-section-number">2.4.1</span> CRAN</h3>
<p>CRAN 是由 R 核心团队维护的存档库，大多数的 R 包都发布在 CRAN 上。R 内置了安装命令 <code>install.packages()</code>。</p>
<p>下面是安装著名绘图包 <strong>ggplot2</strong> 的示例：</p>
<div class="sourceCode" id="cb121"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb121-1"><a href="base.html#cb121-1"></a><span class="kw">install.packages</span>(<span class="st">&quot;ggplot2&quot;</span>)</span></code></pre></div>
<p>本地的源码包也可以通过该命令安装，如安装我本地存有的 <strong>sigminer</strong> 包：</p>
<div class="sourceCode" id="cb122"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb122-1"><a href="base.html#cb122-1"></a><span class="kw">install.packages</span>(<span class="st">&quot;../sigminer_1.0.0.tar.gz&quot;</span>, <span class="dt">repos =</span> <span class="ot">NULL</span>)</span></code></pre></div>
<p>CRAN 默认使用国外镜像，国内的 R 用户下载包速度可能比较慢，推荐使用 <a href="https://mirrors.tuna.tsinghua.edu.cn/help/CRAN/">CRAN 清华源</a>。</p>
<p>先使用命令打开配置文件：</p>
<div class="sourceCode" id="cb123"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb123-1"><a href="base.html#cb123-1"></a><span class="kw">file.edit</span>(<span class="st">&quot;~/.Rprofile&quot;</span>)</span></code></pre></div>
<p>然后在该文档内追加内容：</p>
<div class="sourceCode" id="cb124"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb124-1"><a href="base.html#cb124-1"></a><span class="kw">options</span>(<span class="st">&quot;repos&quot;</span> =<span class="st"> </span><span class="kw">c</span>(<span class="dt">CRAN=</span><span class="st">&quot;https://mirrors.tuna.tsinghua.edu.cn/CRAN/&quot;</span>))</span></code></pre></div>
<p>保存后重启 R。</p>
</div>
<div id="bioconductor" class="section level3">
<h3><span class="header-section-number">2.4.2</span> Bioconductor</h3>
<p>Bioconductor 是一个生物信息学项目，存储了上千个生物信息学领域相关的软件包、数据包和实验包等。安装 Bioconductor 上的包需要先安装 <strong>BiocManager</strong> 包：</p>
<div class="sourceCode" id="cb125"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb125-1"><a href="base.html#cb125-1"></a><span class="kw">install.packages</span>(<span class="st">&quot;BiocManager&quot;</span>)</span></code></pre></div>
<p>然后就可以使用 <code>install()</code> 函数安装 Bioconductor 上的包了，如 <strong>maftools</strong> ：</p>
<div class="sourceCode" id="cb126"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb126-1"><a href="base.html#cb126-1"></a>BiocManager<span class="op">::</span><span class="kw">install</span>(<span class="st">&quot;maftools&quot;</span>)</span></code></pre></div>
<p>值得一提的是，该命令也可以安装 CRAN 上的包。</p>
<p>读者需要注意 Bioconductor 是有不同的版本的，这可以通过下面命令检查：</p>
<div class="sourceCode" id="cb127"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb127-1"><a href="base.html#cb127-1"></a>BiocManager<span class="op">::</span><span class="kw">version</span>()</span>
<span id="cb127-2"><a href="base.html#cb127-2"></a><span class="co">#&gt; [1] &#39;3.10&#39;</span></span></code></pre></div>
<p>尽量保持版本处于最新状态可以获取相关包的最新特性和错误修复。</p>
<p>Bioconductor 默认使用国外镜像，国内的 R 用户下载包速度可能非常慢，推荐使用 <a href="https://mirrors.tuna.tsinghua.edu.cn/help/bioconductor/">Bioconductor 清华源</a>。</p>
<p>先使用命令打开配置文件：</p>
<div class="sourceCode" id="cb128"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb128-1"><a href="base.html#cb128-1"></a><span class="kw">file.edit</span>(<span class="st">&quot;~/.Rprofile&quot;</span>)</span></code></pre></div>
<p>然后在该文档内追加内容：</p>
<div class="sourceCode" id="cb129"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb129-1"><a href="base.html#cb129-1"></a><span class="kw">options</span>(<span class="dt">BioC_mirror=</span><span class="st">&quot;https://mirrors.tuna.tsinghua.edu.cn/bioconductor&quot;</span>)</span></code></pre></div>
<p>保存后重启 R。</p>
</div>
<div id="github-等-git-库" class="section level3">
<h3><span class="header-section-number">2.4.3</span> GitHub 等 Git 库</h3>
<p>GitHub 是知名的开源软件库，上面存储了很多 R 包的源代码，包括 CRAN/Bioconductor 包、未发布包以及玩具包。只要有源代码有正确的 R 包框架，就可以通过 <strong>remotes</strong> 包安装。</p>
<p>例如，安装开发版本的 <strong>ggplot2</strong> 包：</p>
<div class="sourceCode" id="cb130"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb130-1"><a href="base.html#cb130-1"></a>remotes<span class="op">::</span><span class="kw">install_github</span>(<span class="st">&quot;tidyverse/ggplot2&quot;</span>)</span></code></pre></div>
<p>也有其他对应函数安装其他的 Git 库包，如果 git 库还没有被支持（如中国的 gitee），可以使用 <code>remotes::install_git()</code> 安装。</p>
</div>
<div id="包使用" class="section level3">
<h3><span class="header-section-number">2.4.4</span> 包使用</h3>
<p>R 启动时默认加载的包可以通过 <code>.packages()</code> 命令获取：</p>
<div class="sourceCode" id="cb131"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb131-1"><a href="base.html#cb131-1"></a><span class="kw">print</span>(<span class="kw">.packages</span>())</span>
<span id="cb131-2"><a href="base.html#cb131-2"></a><span class="co">#&gt; [1] &quot;stats&quot;     &quot;graphics&quot;  &quot;grDevices&quot; &quot;utils&quot;    </span></span>
<span id="cb131-3"><a href="base.html#cb131-3"></a><span class="co">#&gt; [5] &quot;datasets&quot;  &quot;pacman&quot;    &quot;methods&quot;   &quot;base&quot;</span></span></code></pre></div>
<p>由于在第 <a href="prepare.html#prepare">1</a> 章的配置一节中我有介绍使用 <strong>pacman</strong> 包作为第三方的包管理器，在 <code>~/.Rprofile</code> 中进行了设置，所以该包随着 R 的启动也被加载了。</p>
<p>整个 R 会话当前的所有信息都可以通过 <code>sessionInfo()</code> 获取，在向他人提问时提交该命令结果是一个良好的习惯。</p>
<div class="sourceCode" id="cb132"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb132-1"><a href="base.html#cb132-1"></a><span class="kw">sessionInfo</span>()</span>
<span id="cb132-2"><a href="base.html#cb132-2"></a><span class="co">#&gt; R version 3.6.3 (2020-02-29)</span></span>
<span id="cb132-3"><a href="base.html#cb132-3"></a><span class="co">#&gt; Platform: x86_64-w64-mingw32/x64 (64-bit)</span></span>
<span id="cb132-4"><a href="base.html#cb132-4"></a><span class="co">#&gt; Running under: Windows 10 x64 (build 18362)</span></span>
<span id="cb132-5"><a href="base.html#cb132-5"></a><span class="co">#&gt; </span></span>
<span id="cb132-6"><a href="base.html#cb132-6"></a><span class="co">#&gt; Matrix products: default</span></span>
<span id="cb132-7"><a href="base.html#cb132-7"></a><span class="co">#&gt; </span></span>
<span id="cb132-8"><a href="base.html#cb132-8"></a><span class="co">#&gt; locale:</span></span>
<span id="cb132-9"><a href="base.html#cb132-9"></a><span class="co">#&gt; [1] LC_COLLATE=Chinese (Simplified)_China.936 </span></span>
<span id="cb132-10"><a href="base.html#cb132-10"></a><span class="co">#&gt; [2] LC_CTYPE=Chinese (Simplified)_China.936   </span></span>
<span id="cb132-11"><a href="base.html#cb132-11"></a><span class="co">#&gt; [3] LC_MONETARY=Chinese (Simplified)_China.936</span></span>
<span id="cb132-12"><a href="base.html#cb132-12"></a><span class="co">#&gt; [4] LC_NUMERIC=C                              </span></span>
<span id="cb132-13"><a href="base.html#cb132-13"></a><span class="co">#&gt; [5] LC_TIME=Chinese (Simplified)_China.936    </span></span>
<span id="cb132-14"><a href="base.html#cb132-14"></a><span class="co">#&gt; </span></span>
<span id="cb132-15"><a href="base.html#cb132-15"></a><span class="co">#&gt; attached base packages:</span></span>
<span id="cb132-16"><a href="base.html#cb132-16"></a><span class="co">#&gt; [1] stats     graphics  grDevices utils     datasets </span></span>
<span id="cb132-17"><a href="base.html#cb132-17"></a><span class="co">#&gt; [6] methods   base     </span></span>
<span id="cb132-18"><a href="base.html#cb132-18"></a><span class="co">#&gt; </span></span>
<span id="cb132-19"><a href="base.html#cb132-19"></a><span class="co">#&gt; other attached packages:</span></span>
<span id="cb132-20"><a href="base.html#cb132-20"></a><span class="co">#&gt; [1] pacman_0.5.1</span></span>
<span id="cb132-21"><a href="base.html#cb132-21"></a><span class="co">#&gt; </span></span>
<span id="cb132-22"><a href="base.html#cb132-22"></a><span class="co">#&gt; loaded via a namespace (and not attached):</span></span>
<span id="cb132-23"><a href="base.html#cb132-23"></a><span class="co">#&gt;  [1] Rcpp_1.0.4          roxygen2_7.1.0     </span></span>
<span id="cb132-24"><a href="base.html#cb132-24"></a><span class="co">#&gt;  [3] bookdown_0.18       digest_0.6.25      </span></span>
<span id="cb132-25"><a href="base.html#cb132-25"></a><span class="co">#&gt;  [5] R6_2.4.1            magrittr_1.5       </span></span>
<span id="cb132-26"><a href="base.html#cb132-26"></a><span class="co">#&gt;  [7] evaluate_0.14       highr_0.8          </span></span>
<span id="cb132-27"><a href="base.html#cb132-27"></a><span class="co">#&gt;  [9] rlang_0.4.5         stringi_1.4.6      </span></span>
<span id="cb132-28"><a href="base.html#cb132-28"></a><span class="co">#&gt; [11] remotes_2.1.1       rstudioapi_0.11    </span></span>
<span id="cb132-29"><a href="base.html#cb132-29"></a><span class="co">#&gt; [13] xml2_1.2.5          rvcheck_0.1.8      </span></span>
<span id="cb132-30"><a href="base.html#cb132-30"></a><span class="co">#&gt; [15] rmarkdown_2.1       tools_3.6.3        </span></span>
<span id="cb132-31"><a href="base.html#cb132-31"></a><span class="co">#&gt; [17] stringr_1.4.0       purrr_0.3.3        </span></span>
<span id="cb132-32"><a href="base.html#cb132-32"></a><span class="co">#&gt; [19] xfun_0.12           yaml_2.2.1         </span></span>
<span id="cb132-33"><a href="base.html#cb132-33"></a><span class="co">#&gt; [21] compiler_3.6.3      BiocManager_1.30.10</span></span>
<span id="cb132-34"><a href="base.html#cb132-34"></a><span class="co">#&gt; [23] htmltools_0.4.0     knitr_1.28</span></span></code></pre></div>
</div>
</div>
<div id="编程实战roc-曲线计算与绘制" class="section level2">
<h2><span class="header-section-number">2.5</span> 编程实战：ROC 曲线计算与绘制</h2>
<p>本节我们通过一个计算实例来整合上述所有的知识点。</p>
</div>
<div id="常见问题与方案-1" class="section level2">
<h2><span class="header-section-number">2.6</span> 常见问题与方案</h2>
<p>除了本节目前罗列的问题，读者在学习本章内容时遇到的其他问题都可以通过 <a href="https://github.com/ShixiangWang/geek-r-tutorial/issues">GitHub Issue</a> 提出和进行讨论。如果读者提出的是通性问题，将增补到该节。</p>
<div id="与---的区别" class="section level3">
<h3><span class="header-section-number">2.6.1</span> = 与 &lt;- 的区别</h3>
</div>
<div id="因子重构" class="section level3">
<h3><span class="header-section-number">2.6.2</span> 因子重构</h3>
<p>如果我们向变量 <code>sex</code> 扩充两个 <code>M</code>，可能会遇到不能理解的结果：</p>
<div class="sourceCode" id="cb133"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb133-1"><a href="base.html#cb133-1"></a>sex &lt;-<span class="st"> </span><span class="kw">factor</span>(<span class="kw">c</span>(<span class="st">&quot;Male&quot;</span>, <span class="st">&quot;Female&quot;</span>, <span class="st">&quot;Female&quot;</span>, <span class="st">&quot;Male&quot;</span>, <span class="st">&quot;Male&quot;</span>))</span>
<span id="cb133-2"><a href="base.html#cb133-2"></a>sex &lt;-<span class="st"> </span><span class="kw">c</span>(sex, <span class="kw">c</span>(<span class="st">&quot;M&quot;</span>, <span class="st">&quot;M&quot;</span>))</span>
<span id="cb133-3"><a href="base.html#cb133-3"></a>sex</span>
<span id="cb133-4"><a href="base.html#cb133-4"></a><span class="co">#&gt; [1] &quot;2&quot; &quot;1&quot; &quot;1&quot; &quot;2&quot; &quot;2&quot; &quot;M&quot; &quot;M&quot;</span></span></code></pre></div>
<p>根本原因在于，当我们创建因子后，因子本身存储的实际内容已经被替换为了正整数，分类信息被存储到了水平中，正整数与分类产生的映射对依旧可以保存原本的信息。</p>
<div class="sourceCode" id="cb134"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb134-1"><a href="base.html#cb134-1"></a>sex &lt;-<span class="st"> </span><span class="kw">factor</span>(<span class="kw">c</span>(<span class="st">&quot;Male&quot;</span>, <span class="st">&quot;Female&quot;</span>, <span class="st">&quot;Female&quot;</span>, <span class="st">&quot;Male&quot;</span>, <span class="st">&quot;Male&quot;</span>))</span>
<span id="cb134-2"><a href="base.html#cb134-2"></a><span class="kw">str</span>(sex)</span>
<span id="cb134-3"><a href="base.html#cb134-3"></a><span class="co">#&gt;  Factor w/ 2 levels &quot;Female&quot;,&quot;Male&quot;: 2 1 1 2 2</span></span></code></pre></div>
<p>这样做的好处是节省内存开销，并有利于模型计算：</p>
<ul>
<li>当存在大量字符串时，R 依然只有少量的正整数即可表示。</li>
<li>数学模型并不支持字符串，当将因子纳入统计模型中时，实际上参与计算的是对应的正整数。</li>
</ul>
<p>解决上述问题的一个办法是先将 <code>sex</code> 转换回字符串，然后再创建因子。</p>
<div class="sourceCode" id="cb135"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb135-1"><a href="base.html#cb135-1"></a>sex &lt;-<span class="st"> </span><span class="kw">factor</span>(<span class="kw">c</span>(<span class="kw">as.character</span>(sex), <span class="st">&quot;M&quot;</span>, <span class="st">&quot;M&quot;</span>))</span>
<span id="cb135-2"><a href="base.html#cb135-2"></a>sex</span>
<span id="cb135-3"><a href="base.html#cb135-3"></a><span class="co">#&gt; [1] Male   Female Female Male   Male   M      M     </span></span>
<span id="cb135-4"><a href="base.html#cb135-4"></a><span class="co">#&gt; Levels: Female M Male</span></span></code></pre></div>

</div>
</div>
</div>
            </section>

          </div>
        </div>
      </div>
<a href="prepare.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
<a href="import.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
    </div>
  </div>
<script src="libs/gitbook/js/app.min.js"></script>
<script src="libs/gitbook/js/lunr.js"></script>
<script src="libs/gitbook/js/clipboard.min.js"></script>
<script src="libs/gitbook/js/plugin-search.js"></script>
<script src="libs/gitbook/js/plugin-sharing.js"></script>
<script src="libs/gitbook/js/plugin-fontsettings.js"></script>
<script src="libs/gitbook/js/plugin-bookdown.js"></script>
<script src="libs/gitbook/js/jquery.highlight.js"></script>
<script src="libs/gitbook/js/plugin-clipboard.js"></script>
<script>
gitbook.require(["gitbook"], function(gitbook) {
gitbook.start({
"sharing": {
"github": true,
"facebook": true,
"twitter": true,
"linkedin": false,
"weibo": false,
"instapaper": false,
"vk": false,
"all": ["facebook", "twitter", "linkedin", "weibo", "instapaper"]
},
"fontsettings": {
"theme": "white",
"family": "sans",
"size": 2
},
"edit": {
"link": "https://github.com/ShixiangWang/geek-r-tutorial/edit/master/02-base.Rmd",
"text": "编辑"
},
"history": {
"link": null,
"text": null
},
"view": {
"link": null,
"text": null
},
"download": null,
"toc": {
"collapse": "section"
}
});
});
</script>

<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    var src = "true";
    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
    if (location.protocol !== "file:")
      if (/^https?:/.test(src))
        src = src.replace(/^https?:/, '');
    script.src = src;
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script>
</body>

</html>
